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Standards exist for many aspects of scientific and 
technical information and its management, This paper 


examines standardization from the perspective of an 


‘ideal information scenario’. It focuses both on the 
older well-established standards and those standards 
currently under development which are most important 
for information management. It concludes with a 
proposed approach for planning standards activity. 


1. Introduction 

Development in Science and Technology is critical to 
academic, industrial, commercial and national success. 
Effective scientific research and technology transfer 
cannot be confined within national boundaries but 
compel the participants to be part of an international 
information network. Innovations in technology are 
stimulating profound changes to the capability and 
operation of this network. Rapid peer-to-peer 
communication; dynamic, interactive access to 
international data and documentation; the virtual library 
are all on the threshold of becoming reality. It is in this 
context that the role of standards must be examined. 


2. User requirements for scientific & technical 
information management 

The long term view requires that all aspects of the 
information management process be capable of 
integration at the desktop through simple to use, 
intuitive interfaces. In his book, Shaping the Future, 
Peter Keen, a leading IT strategist, provides a 
framework for thinking about the degree of flexibility 
required for sharing and using information, using the 
concept ‘reach and range". ‘Reach’ covers the question 
‘who has access to the information?’. ‘Range’ asks:- 
‘what services or information are offered?’ Figure 1 
shows how these concepts can be used to depict the 
ideal information scenario. 

Users, from their desks, wish to exchange simple 
messages with relevant contacts — within their work 
group and far further afield. If working jointly with 
others — for instance, on a research project or a patent 
specification — the user will wish to review data; to 
build databases; to exchange documents in draft; to 
annotate with any changes required; and to achieve a 


finished product that meets the needs ofall contributors. 
The completed document may be for internal 
publication either locally or throughout an organization. 
The document may be destined for external publication 
in a scientific journal that will require peer review, 
format standardization, and further editing. Itis possible 
now to conceive of all these processes being handled 
electronically and of the publication being made 
available through an on-line electronic journal e.g. the 
On-line Journal of Current Clinical Trials’. 


Reach 


Anyone 


Anywhere Viere 
Multiple want 
Countries to 
Regions be 
Other 

National 

Institutions 

Institution 

Department 

Work Group 

Individval Range 


Message Interactive — Publication & Identification — A.:cessto 
exchange document communication ofprevious — decuments 
creation work ard data 


Figure 1: The ideal information scenario 


For effective research and development, it is 
essential that the user can identify and access relevart 
information that already exists. Interactive, instant- 
aneous access to an enormous variety of sources is 
already a reality, dependent only on the strength of 
local collections, the reliability and affordability of 
telecommunications access to remote sources, and the 
availability of equipment and funds. However, the 
proliferation ofsearch languages and indexing policies 
imposes a considerable training burden oa the user. In 
addition, maintaining a current knowledge of relevant 
sources requires a considerable investment in time. 
Access routes which guide the user to relevant sources, 
while requiring him or her to use only one set of search 
instructions represent the real user needs. 

Once relevant material has been identified, this is 
likely to be in the form of a bibliograpaic reference, 
with or without an abstract. Ifthe user is ску this will 
contain the information required. More often than not 
access to the full document will be required. This may 
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be a journal article, a chapter in a book, a national 
report, a technical or patent specification. Ideally, the 
user will want to request this electronically and to be 
able to browse through it on-line, selecting relevant 
information to note, to retain, and to use. The user will 
have little or no interest in the location of the stored 
document, unless this affects cost or useability in any 
way. The user will not, of course, just want access to 
textual material this way, but to the figures, graphs, 
photographs, tables and formulae which feature in the 
original publication. 

These information needs are not just relevant to 
Science, Technology and Medicine, but also to selling, 
marketing, production — in short to any industrial, 
commercial or governmental activity which requires 
interactive communication, exchange of information, 
and access to already existing information. 

This ideal scenario is no longer just a pipe dream. 
Although its complete existence belongs in the future, 
there are many aspects of it that already exist on a 
smaller scale. Critical to its full realization, which will 
maximize the benefit of recorded knowledge, is the 
adoption of standards that will facilitate the transfer 
and organization of information and the development 
of products to support information handling. 

Before looking at the standards themselves, let us 
look at the various routes which led to their evolution. 


3. Standards & standards organizations 

Standards fall into four principal categories — de jure 
standards (those promulgated by a legislative body 
and which therefore have the force of law); de facto 
standards (those supported by a wide range of 
manufacturers or suppliers); proprietary standards 
(those which are specific to a particular product range); 
and voluntary consensus standards’, 

At the international level many of the standards 
arising from the CCITT (International Consultative 
Committee for Telephones and Telegraph) constitute 
de jure standards covering tariffs for telegraphy and 
telephony and standards for interconnection of national 
telecommunications systems’. 

De facto standards such as the Postscript Page 
Description Language or the IBM PC become standards 
simply because they begin to be used so widely. These 
standards document, define or describe an existing 
reality (or problem solution) so that others can easily 
reproduce this reality (or solve a similar problem) 
therefore avoiding replication of effort. Within the IT 
arena, consortia of companies have been particularly 
active on the de facto standards front, e.g. the Open 
Systems Foundation. Companies may make an internal 
standard publicly available to widen the market for 
their products, 

Proprietary standards are also important in 
information activity. IBM’s System Network 
Architecture, SNA, is a widely used protocol for wide 
area data communications. PC networks may also use 
proprietary standards, e.g. Netware from Novell. 
Organizations often adopt proprietary standards to 








facilitate communication and information transfer, e.g. 
Microsoft’s WORD. 

Many of the standards relevant to information 
management are voluntary consensus standards 
developed through national or international discussion 
and compromise between accredited organizations. 
Officially these types of standard are defined as: ‘a 
technical specification or other document, available to 
the public, drawn up with the co-operation and 
consensus or general approval of all the interests 
affected by it, based on the consolidated results of 
science, technology, and experience, aimed at the 
promotion of optimum ccmmunity benefits, and 
approved by a body recognized at a national, regional 
or international leveP.* 

Life is, of course, not that simple. Voluntary 
consensus standards may have their origin та de facto 
standard and may in time be used as the basis for a de 
jure standard. The adoption of a voluntary consensus 
standard by government will give it the force of a de 
jure standard. For instance, GOSIP (the US 
Government’s Open Systems Interconnection Profile) 
establishes a de jure standard for advising potential 
vendors what interconnection protocols are regarded 
as essential to satisfy any federal agency purchaser of 
computer equipment — although many of the standards 
within GOSIP are ISO Open Systems Interconnection 
Standards, i.e. voluntary consensus standards". 

Standards aim to promote trade by the removal of 
barriers caused by differences in national practices; to 
protect consumer interests through adequate and 
consistent quality of goods and services; to promote 
economy in human effort, materials and energy in the 
production and exchange of goods; to promote the 
quality of life, safety and health and the protection of 
the environment; to provide a means of communication 
between all interested parties; and to encourage co- 
operation in economic, intellectual, technological and 
scientific endeavours‘. 

Virtually every nation produces standards as do 
regional groups of countries, e.g. the European 
Community, and international organizations. The 
standards making process is complex, involving 
numbers of committees; consultation processes; 
drafting; editing; ratification and publishing. Then the 
cycle of revision and harmonization with other 
standards begins*.\The process of standardization more 
often than not lags behind the need for the standard, 
producing risk both for the manufacturer developing 
products and for organizations anxious to ensure the 


„eventual interoperability of equipment and systems. 


Open or de facto standards are cften therefore in 
advance of voluntary consensus standards, coming, 
for instance, from a research community anxious to 
make progress in co-operation or from manufacturers 
wishing to stimulate product development. 

At the international level, three groups are key to 
the development of standards. The Organization for 
International Standards (ISO) isa chartered body under 
the UN with the purpose of promoting international 
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trade by encouraging the adoption ofcommon standards 
for goods and technology. Its membership is drawn 
from the national standards bodies. A variety of 
committees handle standards initiatives in different 
areas, e.g. ISO TC46 covering information and 
documentation; ISO TC154 covering electronic data 
interchange. The International Electrotechnical 
Commission (ТЕС) covers the electrical and electronic 
fields. A Joint Technical Committee (JTC1) of ISO 
and ТЕС was established in 1987 to handle all 
information processing system standards. The ISO 
and IEC are mirrored at the national level by standards 
bodies, who represent their countries on these groups. 
International standards for telecommunications are 
developed by the CCITT (International Consultative 
Committee for Telephones and Telegraph), a sub- 
group of the International Telecommunications Union 
(ITU) and also an organization set up under UN treaty. 
ISO/IEC and CCITT do work together to ensure that 
their recommendations are closely aligned and 
‚ compatible with one another *°. 

Standards groups may also exist to reflect regional 
interests. CENELEC (Comité Européen de Normal- 
isation Electrotechnique) and CEN (Comité Européen 
de Normalisation) develop European Community 
standards for both information processing and 
information management and have received tremen- 
dous impetus by the creation of the Single European 
Market. The ISO has helped to establish a number of 
these regional standards organizations, including the 
Pacific Area Standards Congress (PASC) * $6. 

National and international groups which have key 
roles in influencing and developing information 
standards are IFLA (the International Federation of 
Library Associations) and major national libraries such 
as the US Library of Congress, the British Library and 
the UK and Canadian Library Associations who have 
‘been the predominant force behind bibliographic 
standards.. NISO, the US National Information 
Standards Organization, has developed the 739 series 
of standards, covering a range of information and 
library activities. Some of these standards are surprising, 
for example Z39.41, a standard for formatting 
information on the spine of books and Z39.13 which 
covers how to describe books in advertisements, 
catalogues, promotional material and book jackets’. 

A basic principle of standardization is that a 
standard should be created or endorsed at the highest 
possible level to secure maximum benefit’. Within the 
information arena it is critically important that, 
wherever possible, standards should be internationally 
endorsed, since information is an international 
commodity. 

Standards do not, however, serve as an absolute 
guarantee of consistency. Standards, when initially 
published, may be ambiguous and may allow the user 
certain options that need to be clarified by the user 
organization or the national regulatory body. This 
flexibility may have dangerous longer term implications 
for information exchange. 
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Within the scope of this paper, exploration of all 
the standards relevant to scientific and technological 
information is impossible. By focusing briefly on the 
areas of greatest excitement to me it may just be 
possible to show some of the building blocks for the 
ideal information scenario. I have classified these 
areas into five functional groups: 

e transfer and communication of information 
e resource sharing 

e representation of information 

° 

ә 





access to information 

the human computer interface. 
Many of the standards which are fundamental for 
information handling are developing from the 
Information Technology arena but must be understood 
by the information and library community. 


4. Standards to support the transfer & 
communication of information 
telecommunications standards 

Telecommunications standards have been studied and 

worked on for many years, with CCITT and ISO 

performing key roles. 

ISO's Open Systems Interconnection Model 
(OSI) covers a wide range of formal specifications (or 
protocols) for ensuring that computers are able to 
communicate over wide area public telecom- 
munications networks, such as PSS, as well as over 
local area networks. The OSI communication model is 
represented in seven layers. Each layer may be seen as 
a process or program which communicates with the 
corresponding process on another machine according 
to the protocol or standard governing the layer. The 
lowest layer is concerned with the physical network 
cabling. The datalink and network layers deal with 
moving packets of data within the network and selecting 
a route and include protocols such as X.25. The 
transport layer deals with reliable movement of data 
from one host computer to another across a network or 
a group of networks connected by gateways. The 
upper layers ofthe model are concerned with common 
applications. The session layer co-ordinates the 
communications interchange between co-operating 
application processes. The presentation layer ensures 
compatible syntax among the communicating processes 
by adjusting data structures, formats and codes. The 
application layer is the user interface to the host 
computer's communication method. Applications don't 
reside in the layer; the layer allows the application to 
gain access to services provided by the architecture. 
Key applications for information transfer are CCITT 
X.400 (for message handling or electronic mail; CCITT 
X.500 (for directory services) and FTAM (ISO 8571 
for file transfer and access management)*?. 

Essentially OSI definesthe properties and behaviour 
of a message passing environment; enabling the 
exchange of mutually intelligible messages or protocol 
data units between applications on different machines 
to allow them to perform a shared task. Advantages of 
OSI are therefore that: 
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® it provides a flexible architecture in the context of 
which OSI products can be developed 
e it allows resources to be widely shared 
e it allows choice in the selection of multivendor 
products!!, 
Open or de facto standards (those supported by a wide 
range of manufacturers) are also important in data 
communications. The TCP/IP protocol (developed for 
the US Department of Defence) is supported by a 
number of hardware and software manufacturers. The 
US network, Internet, uses the TCP/IP protocol but is 
expected to support TCP/IP and OSI in parallel within 
the next few years. Proprietary networking standards 
also exist. IBM's SNA (Systems Network Archi- 
tecture), for example, is a widely used protocol for 
wide area data communications 5 !!, 

For users of networks, who are largely dependent 
on the interconnection standards choice of their 
organizations, a major concern is the co-existence and 
interoperability between the major players, OSI and 
TCP/IP. The technical issues are complex but must be 
resolved if the ideal information scenario is to be 
realized”, 

Developments in communication capacity are also 
exciting. Under the direction of the CCITT, the 
standards for ISDN (Integrated Systems Digital 
Network) have been evolving since the late 1970s. 
ISDN is replacing the analogue telephone network and 
is capable of handling not just voice data but facsimile 
images, text files, and data records. The speed of 
transmissions is also being greatly increased. ISDN 
offers tremendous opportunities for information 
transfer. ISDN-2 can provide two separate channels 
down the same wire, e.g. voice on one channel with 
simultaneous data or video transfer on the other. One 
exciting possibility is the concept ofa ‘white board’. A 
teleconference could use one channel for voice and the 
other for data exchange. The persón at the screen can 
draw as if on a white board. Participants can see this 
image and comment on and change it by drawing on 
their own PCs. 


X.400 
X.400 has considerable implications for information 
management and encompasses a set of standards which 
govern the interchange of messages between different 
electronic mail systems thus enabling users in different 
organizations to communicate. The X.400 format 
defines the structure of the message as consisting of: 
e theenvelope, containing the destination and return 
addresses, submission time and other attributes, 
such as priority 
e the contents, comprising the message heading and 
the body ofthe message. The body is the text ofthe 
message; the heading contains the distribution list 
and details such as whether the message is private 
or whether a reply has been requested. 
Electronic mail is already widely used within and 
between organizations and for scientific and technical 
information providers it yields a reliable, time 





independent route for unstructured service requests in 
the form of simple messages, e.g. requests for 
information. Bulletin Boards provide an almost 
instantaneous method for sharing new information 
with a group of users or collaborators, e.g. directories 
ofresources and services. Farthe academic community 
in many countries, electronic mail already simplifies 
collaboration by providing a quick means of keeping 
in touch. The ultimate goal for X.400 is a universal 
mail system, including public services as well as the 
vendor supplied private services. 

By offering a standard vehicle for the commun- 
ication of messages between computer systems X.400 
can also provide a vehicle for the exchange of any 
other digital information — fax, voice, graphics, images 
for instance. In addition, italso provides the opportunity 
for developing distributed applications and, most 
importantly, for the communication of structured 
messages between applications, for instance for the 
management of document requests and inter-library 
loans (ILL). X.400 also offers the possibility of 
electronic document delivery *". In my own 
organization an X.400 e-mail system is now used daily 
for the distribution of alerting services and search 
requests to customers and for exchange of documents 
with subsidiaries. Other uses of e-mail include the 
computer conference, where a message sent to a single 
address is redistributed to a list of conference 
participants. This facility is growing in popularity. 
Some conferences are moderated, with messages vetted 
for suitability, others are not. The library community is 
one of those communicating in this way. Many 
conference systems archive messages automatically 
and the archives can be searched. Some information 
centres monitor conferences and archive messages for 
their users. E-mail based bulletin boards have also 
been developed to share new information with users, 
e.g. directories of sources and services. 


EDI 

The development of standards for EDI (electronic data 
interchange) is also of impcrtance to the information 
exchange infrastructure. Proprietary standards exist 
but EDIFACT ( Electronic Data Exchange for 
Administration, Commerce and Transport) is evolving 
under the auspices of the ISO and the United Nations 
Economic Commission for Europe (UN/ICE). 
EDIFACT provides for stendard data structures in 
terms of format and data content for specific business 
transactions, e.g. orders, quotations, and invoices. There 
is a parallel US EDI Syntax, X.12, and steps are being 
taken to align the two. The standard has particular 
relevance to the book trade world for speeding up 
transactions between publishers, suppliers, biblio- 
graphic agencies and libraries, i.e. the whole 
administration ofthe publication, ordering and indexing 
process!» 5, Bibliographic inzormation can be provided 
to a library for downloading and review. A decision to 
order can be transmitted to the publisher, the invoice/ 
payment exchanged electronically and the book record 
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retained in the library’s catalogue. EDI messages can 
be delivered via X.400 based systems. 


ЕТАМ 

FTAM (File Transfer, Access and Management) is а 
complex protocol which supports the manipulation of 
files over a network. It contains facilities for accessing 
files held remotely, for file transfer and for special 
applications such as printing. It thus offers facilities 
for the on-line bulk transfer of records — for catalogue 
record exchange, for instance," or for delivery of large 
volumes of search results "^. 


5. Standards to support resource sharing 

MARC 

EDI standards such as EDIFACT are beginning to 
provide for resource sharing between publishers and 
information centres. The exchange of bibliographic 


materialis even more well established as a co-operative 


activity within the field of librarianship. This sharing 
reduces cataloguing effort, sustains cataloguing 
standards, and widens access to library holdings both 
nationally and internationally. ISO 2709 is the 
international standard specifying a record structure for 
theexchange of bibliographic information on magnetic 
tape, adopted in 1973. The standard specifies the 
structure for the record but not its content", MARC, 
which complies with ISO 2709 and indeed wasa stimulus 
for the international standard, was developed by the 
US Library of Congress. The work on MARC fostered 
an enormous growth in shared cataloguing activity. 
The MARC format includes up to 61 data elements 
and is compatible with the bibliographic guidelines, 
AACR2 and Dewey!?. The format comprises the 
description of bibliographic data and the data itself. 
Sinceits introduction, other bibliographic exchange 
formats adhering to ISO 2709 have been developed, 
for instance regional MARC formats and UNIMARC. 
UNIMARC is overseen by the International MARC 
Network Committee, a committee set up by the 
Conference of Directors of National Libraries, and is 
maintained by IFLA's UBCIM (Universal Biblio- 
graphic Control and International MARC) programme. 
The function of UNIMARC is to provide a switching 
format between national formats. A national 
bibliographic agency may, for example, issue records 
in its own country in their national format but may 
issue those same records in UNIMARC to another 
country which must have the ability to convert 
UNIMARC records to their national format. The 
content designations of UNIMARC's fields and sub- 
fields are based on other standards developed by the 
UBCIM programme. What does not exist today is a 
single version of MARC, with the consequent need for 
conversion programs to convert from one ISO 2709 
based standard to another, and some confusion for 
those considering which standard to adopt ^ 5, 
MARC and UNIMARC were developed for the 
library sector and focus on discrete publications. The 
UNISIST reference manual for machine readable 
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bibliographic descriptions was developed in response 
to the needs of secondary services and gives equal 
prominence to journal articles and monographs and 
serial titles. The UNESCO Common Communication 
Format, CCF, is another exchange format developed 
for exchange of bibliographic records of both 
monographs and serials, and is particularly suitable for 
items such as journal articles and contributions in 
journals, and for bibliographic subsets, e.g. conference 
proceedings found in secondary services". CCF uses 
the record structure in ISO 2709 and was developed to 
facilitate exchange of records between different 
systems. 


Inter library loans (ILL) 

The availability of information on the holdings of 
other libraries is a powerful stimulus for requests to 
obtain that material in original or photocopied form. 
There are many proprietary systems for the transmission 
of requests to document suppliers. Many of these are 
proprietary. The document ordering systems offered 
by database hosts and the OCLC (On-line Computer 
Library Centre) ILL System in the USA which 
processes some 6 million records per annum are just 
two examples. 

Excitingly, an OSI protocol for ILL requesting has 
reached the draft international standard balloting stage 
— ISO DIS 10160/10161 — Interlibrary Loan Service 
Definition and Protocol Specification. Trial implem- 
entations are being developed and evaluated — e.g. 
project ION and the NLC/BL/LC pilot study!* and 
various manufacturers are being encouraged to 
incorporate these protocols within their products*. The 
National Library of Canada, with the objective of 
facilitating resource sharing in a large and sparsely 
populated country, has been the pioneer in the 
development of the ILL protocol.!! 

The ILL standard is an OSI application level 
protocol. The two components of the standard are: 

e the service definition which describes the 
functionality that a user of the protocol can exploit 

e the protocol specification which defines how these 
services are provided "!. 

The standard only addresses communication oriented 

aspects of the ILL process; it does not deal with local 

internal ILL processing. The standard's content and 

development status is fully described by Dempsey". 

Once developed, systems utilizing the standard 
will be able to communicate with one another. A 
requester will therefore be able to request an item from 
an institution by providing item identification 
information, e.g. author, title; when the item is needed; 
by whom; the destination; and whether the item itself 
ora photocopy is required. The requester can ask for a 
cost-estimate of providing the item, ask where the item 
is held, and request that the item be reserved if not 
immediately available. If the item cannot be provided 
fromthe institution receiving the request, the responder 
canrejectthe request or forward it to another institution 
either as a new request or using the original ILL- 
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transaction identifier. Requesters can themselves 
control the use of forwarding as well as providing a list 
of potential respondents to whom a request might be 
sent and those who have already been contacted. An 
item may then be dispatched, an overdue or recall sent, 
a renewal requested and so on. The standard offers 
tremendous opportunities for simplifying and reducing 
the labour in ordering materials from other sources. 

Two options are available for structuring the data 
in the messages exchanged by the protocol: ASN 1 
(Abstract Syntax Notation) and EDIFACT. The latter 
may offer short-term local advantages for some 
countries but it is thought that long term its adoption 
may hinder international interoperability. The 
question of how the item is described is also valid here. 

The ILL protocol is based on the use of X.400 as 
the medium for message interchange between the 
requester and supplier’. There is, however, some 
controversy over whether this is the best approach. 
X.400 is a ‘store and forward’ protocol, i.e. there is an 
exchange of messages butno direct real time interaction 
between requester and recipients. The US favours an 
approach based on connection oriented communication 
—where a connection is established, data is transferred, 
and the connection is released". 


Document delivery 

Streamlining the interlending process is only one 
component of document delivery. Users in the ideal 
information scenario want immediate access not just 
to a bibliographic reference but to information — the 
journal article, the book. The SGML and ODA 
standards which are relevant to document exchange 
will be discussed later. Transmission of images is of 
increasing interest. Here relevant standards must cover 
the structure of the image file itself and also its 
transmission. For the image file it is necessary to 
specify, inter alia, the resolution ofthe scanned image, 
the compression algorithm, the file format, and a 
description or identification of the image. GEDI, a 
Group on Electronic Document Interchange, including 
libraries and library utilities from Europe and the 
USA, is looking at the potential use of OSI applications 
for the transmission of electronic images of 
documents !" 20, Tt has recently developed a standard 


for exchange of scanned images. Using a de facto ` 


standard, TIFF (Tag Image File Format), as the imaged 
format and a standardized format for encoding cover 
page information such as bibliographic details and 
request identifiers, this standard enables documents to 
be passed between different private document delivery 
domains!* 20. The Internet Engineering Task Force 
(IETF) has also developed a recommendation for image 
file formats and their transmission. This also uses 
TIFF asa file header. GEDI, however, is more detailed 
in terms of the description of the file and it is possible 
to identify requester, supplier, bibliographic and 
transaction details from the GEDI file. Both X.400 
(ISO 10021 or later), ЕТАМ and a TCP/IP file transfer 
protocol can be used to transmit images. The former 





has the advantage of allowing delivery direct to users 
with desktop network access. The latter is more secure. 


6. Standards for the representation of information 
general 

There are far too many standards covering the 
representation of information for comprehensive 
discussion here. The role of CODATA and IUPAC in 
the development of international standards for the 
documentation and formatting of scientific and 
technical data is well established although much 
remains to be done. Development of software for 
handling chemical structures has stimulated the 
development of de facto standards such as the Chemical 
Abstracts registry number, and standard molecular 
data formats for the exchange of connection tables 
between systems?! An extensive range of standards 
has developed in the area of computer graphics?. The 
Human Genome Project has set itself the enormous 
task of defining and documenting all human genetic 
codes, and is developing standards for recording. this 
genetic data. | 


Bibliographic standards 
At a more mundane level, the basic bibliographic 
standards continue to be important. The International 
Standard Book Number (ISBN) and the International 
Standard Serials Number (ISSN) are fundamental, key 
standard formats for many applications including EDI. 
The ISSN is administered by the International Serials 
Data System (ISDS) whose central co-ordinating 
agency is based in Paris. The objectives of ISDS are: 
e to develop an international register of serial 
publications 
e to define and promote the use of a standard code 
ISSN 
e to enable the retrieval cf scientific and technical 
information in serials 
® topromote international standards for bibliographic 
description, communication formats and information 
in the area of serials. 
The ISSN (an eight digit number) is inseparably linked 
with the key title of the serial, a standardized form of 
the title derived from the serial issue as the ISSN is 
created. The ISSN is now incorporated into the EAN 
barcode for periodical publications. The use of the 
ISSN as an authority file by libraries is growing and 
also its practical use within catalogues and for · 
administrative operations in serials management. ISDS 
is also involved with the abbreviation of serial title 
words and with the development of other standards 
which incorporate the ISSN, for example, standards 
for electronic manuscript preparation and mark up”. 
The ISSN is of obvious relevance to interlending. 
The ISBN is similarly well established. However, 
one disadvantage that is well known is that, while an 
ISBN identifies a particular edition of a book, it lacks 
a second level of intelligence and cannot be used to 
link several versions of the book together”. For 
cataloguing, two standards are particularly important, 
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AACR2 and ISBD. The Anglo American Cataloguing 
Rules, AACR2, are probably the most well established 
bibliographic standard. AACR2 is an internationally 
secured standard within the English speaking world, 
controlled by a Joint Steering Committee of national 
library and library association nominees. 

The ISBD (International Standard for Bibliographic 
Description) developed by IFLA is well accepted as a 
standard for descriptive cataloguing, and AACR2 
conforms to ISBD. ISBDs exist for monographs, 
ISBD(M), and for serials, ISBD(S). ‘Guidelines for 
the application of the ISBDs to the description of 
component parts’ provides for the systematic 
description of chapters in books, journal articles, or 
bands on sound recordings. Other ISBDs cover 
computer files, cartographic materials, music, etc. All 
ISBDs must conform to ISBD(G), the common general 
framework?, ISBD is being successfully used in 
different countries and with a variety of scripts”. The 
purpose of ISBDs is to make it easier to recognize data 
elements whatever the language of their content, to 
standardize national practices in the content and 
arrangement of the bibliographic record, and to facilitate 
the application of computer processes to the 
manipulation of bibliographic data. ISBDs are not 
cataloguing codes as such — they carry no stipulations 
for determining access points for subject classification 
or filing —but they have been incorporated with national 
cataloguing codes. 

Consistent approaches to the description of library 
holdings are currently critical to the sharing of 
information resources * but in their present form may 
well be less relevant to the virtual library of the future. 
There is also concern in many countries that cataloguing 
rules are now too complex and too detailed to be 
followed meticulously’. 


Indexing and classification 
Free text natural language systems within OPACs 
(Online Public Access Catalogues) and other data- 
bases have not yet made the need to establish structured 
shelf locations for monographs within collections nor 
the need for thesauri redundant. The rapid advance of 
scientific and technical knowledge is a particular 
problem for indexing and classification, where 
standards will always lag behind the need to use them. 
The best known classification systems are best 
seen as guidelines but are virtually proprietary 
standards, e.g. the Dewey Decimal Classification 
(DDC), the Library of Congress Classification and 
UDC (the Universal Decimal Classification). Both 
Dewey and the Library of Congress Classification 
appear on MARC tapes^. Dewey is the most widely 
used classification in the world. Although Dewey is a 
system developed for the English speaking world, it 
has also proved adaptable for different cultures and 
countries. UDC is more flexible, less well documented 
andcertainly has achieved less international penetration 
than DDC. Its supporters accord UDC a bright future 
and suggest that the use of UDC descriptions and 
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structure is of potential use for subject searching in 
online systems 26:27. | 

Thesauri standards cover two aspects: the structure 
of a thesaurus and its content. Construction standards 
exist for monolingual and multilingual thesauri (ISO 
2788 and ISO 5904) respectively. These standards 
cover vocabulary control, relationships between terms, 
and thesaurus display??. They do not appear to be 
particularly well known or followed. With respect to 
subject content, thesauri evolve to meet the needs of 
particular subjectareas, e.g. the MeSH (Medical Subject 
Headings) thesaurus developed by the National Library 
of Medicine which provides subject terminology that 
is well used throughout the biomedical world anc is an 
important tool to support precise indexing and therefore 
subject retrieval. 


Document descriptions (document portability) 
SGML 
Tagging languages such as SGML and compound 
document architectures such as ODA are emerging as 
the two main strategies for ensuring that documents 
can be exchanged across different computing 
applications and platforms, with their structures intact. 
Both SGML and ODA are ISO standards. Documents 
can be extremely complex — both standards have 
approaches to handling not just text but graphics, data 
files and audio and visual material. SGML is currently 
the standard gaining ground, with takeup by the US 
government. 

SGML (Standard Generalised Markup Language) 
is one ofthe most exciting standards developments of 
recent years. SGML, ISO 8879, was established in 
1986. Designed to allow the exchange of information 
at any level of complexity among software, hardware, 
storage and presentation systems it is being widely 
adopted in academic, government and commercial 
sectors. SGML is a language (a tagging language) in 
which text structures and content can be expressed. It 
provides an unambiguous syntax for describing 
whatever a user chooses to identify in a document 
without reference to the output device. Descriptive 
mark up can therefore be done once and will suffize for 
all future processing. The mark-up serves two purposes: 
e separating the logical elements of the document 
e specifying the processing functions to be performed 

on those elements. 

e SGML provides for a character set, generally based 
on ASCII which can be sent safely to any other 
system. Unusual characters, e.g. mathematical 
symbols and accented foreign characters are 
expressed as 'entity reference codes' and turned 
into ASCII representations so that they can be 
converted by the receiving system into whatever it 
needs to reproduce those characters. Entity reference 
codes can also refer to files or data held outside the 
document 

e adefinition ofthe structure ofthe document: a DTD 
or document type definition is produced for a 
particular type of document, e.g. a book, a company 
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R&D report, a journal article, and consists of the 
specific processing rules for encoding or decoding 
a document’s structure and the mark up tags which 
express the structure. 
The DTD defines elements that must be present in 
the document. These definitions are called element 
declarations. Element declarations indicate the 
official name of an element which will appear 
inside delimiters as a tag, e.g. <chapter>. They also 
describe what each element may contain, the content 
model. For example, the allowed contents for a 
chapter may be a chapter title, followed by any 
number of paragraphs, interspersed with the 
headings. SGML provides a formal syntax for this 
element declaration. 
The DTD also provides for references to the entity 
reference codes mentioned above and for the 
addition of attributes to an element. The attributes 
may be used to record information associated with 
a textual element but not regarded as a textual 
element in its own right, e.g. confidentiality. The 
DTD describes attribute names, their possible values 
and the elements to which they can be attached. 
Although DTDs do not indicate how to process 
non-text objects, a DTD specifies mark up tags 
called escapes which cause the processing program 
to jump to an application that can cope with the non- 
text object. 

® Markup: by providing general facilities for markup, 
e.g. <jour> text <ljour> to specify a journal title, 
SGML does not specify a style in which the docu- 
ment will be printed but a structure which allows that 
style to be specified at a later stage. DSSSL (Docu- 
ment Style Semantics and Specific-ation Language) 
is the international standard which allows expression 
of the characteristics to be applied during processing, 
e.g. composition, pagination, typography. 

This description of SGML is, of necessity, simplistic. 

In addition to providing for vendor independence, 

streamlining the production process, and providing a 

structure for the exchange, publication and display of 


electronic documents, SGML also provides the: 


foundation for the creation of flexible databases, 
providing further access to the data for information 
retrieval and for re-use. SGML also offers an approach 
to document structuring for hypertext based retrieval. 
One can immediately see potential for navigating 
documents, providing hypertext links between 
documents in different collections, and for referencing 
other information types. SGML cannot encode other 
information types but Hy-Time (Hypermedia Time- 
based structuring language) is a standard markup 
language for representing (tagging) non-text objects 
so that they can be rendered as a complete document. 
Hypertext, multimedia, hypomedia can also be 
accommodated by Hy-Time which can also be used 
for incorporating time- and space-based documents, 
e.g. sound and video. 

There are, however, limitations to SGML-SGML 
does not specify documents, it specifies DTDs and 
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incompatible DTDs defeat the purpose of universal 
document exchange. 

While SGML is the mcst prominent standard for 
document structure, there is also support for the OSI 
standards, ODA (Office Document Architecture, ISO 
8893) and ODIF (Office Document Interchange 
Format). ODA/ODIF allows exchange of compound 
documents, containing text, graphics and data. ODA 
codes documents as in-memory arrays called aggregates 
which can represent audio, graphics, text, video, as 
well as the document’s physical format, logical 
organization and text stybng?*. In principle, ODA 
formats are designed to link with X.400 to provide a 
wide range of permissible message contents>. 

SGML and ODA offer the facility to handle 
electronic documents very flexibly both for display 
and for re-use. 


Final form document interchange 
These standards aim to describe the format ofa printed 
page in a manner that is printer independent. 

Several de facto standards have emerged in this 
area, of which Adobe’s Postscript and Xerox’s 
Interpress for high speed printers are worth noting. 
The SPDL (Standard Page Description Language) is 
one of the CALS (Computer-Aided Acquisition and 
Logistical Support) initiatives to standardize formats 
for printing technical publications?. 


7. Access to information 

As the volume of electronic sources grows so does the 
challenge of identifying relevant and useful 
information. How does one navigate a network? 


X.500 

X.500 is a standard for directory services and one of 
the general OSI application layer services. The directory 
isa distributed database and in its simplest manifestation 
functions as a telephone directory for electronic mail 
users, with information on people and organizations 
available for online inspection, e.g. e-mail addresses, 
phone numbers, postal addresses, fax numbers, 
organizational affiliation, Йе. A user interacts with 
the Directory by means of a Directory User Agent 
(DUA). This links with a Directory Service Agent 
(DSA)which provides access to the database. Queries 
to a DSA can be passed oato other DSAs; together 
these DSAs provide the global directory. 

In addition to holding information on people and 
organizations, the addition of resources, e.g. collections 
available through the network, offers one practical 
route to identifying useful sources. Information about 
collection strengths, ILL policy, databases would all 
be possibilities. The National Library of Canada has 
identified various candidates for distribution via an 
X.500 directory, e.g. publisher information, potential 
targets for ILL requests. The GEAC OSI group is 
reported to be considering the development of a 
directory scheme suitable for library applications. 
Functionalities suggested include remote search of 
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online public catalogues; acquisitions support, e.g. 
remote search and retrieval from book vendor 
databases; support of ILL (e.g. to locate items); 
capabilities which will allow libraries to control and 
monitor the extent of their participation in the wider 
community. The GEAC plans are far in advance of 
other implementations of X.500. The search format 
proposed is the Common Command Language (to be 
mentioned later). Within the UK an academic X.500 
project is exploring the use of directory services as is 
Project ION, the EC supported interlending OSI 
network pilot/demonstration project between Laser, 
Pica (Netherlands) and PEP (interlibrary loan system 
of the French Ministry of Education)". 


739.50 and search & retrieve 
Together with SGML, Z39.50 and Search & Retrieve 
standards are currently the most important develop- 
ments for information management. At the moment 
retrieval of information held on different hosts requires 
the user to understand the search details for each 
system. The ISO information retrieval protocol, Search 
and Retrieve (SR) ISO DP 10162/10163 and the parallel 
US NISO Standard Z39.50 are aimed at solving this 
problem. There is a commitment to harmonize the 
second version of Z39.50 (739.50 — 1992) with SR so 
that SR will be a functionally compatible subset of 
239.50". 
Z39.50/SR can be envisaged as having two 
components: 
e a generic search format in which searches can be 
expressed 
e the protocols for managing the transmission of the 
query to the host computer; the management of the 
search process; and the return of the results. 
The search and record transfer framework is'designed 
to be independent ofthe types ofthe information being 
retrieved although early experiments have focused on 
bibliographic databases. The local (origin) system 
searching commands are mapped onto a generic 
command format. The target system maps these 
commands onto its own local format. In this way the 
user can search remote systems but use their own 
familiar commands. 


The standard uses a common abstract model to ` 


describe databases. The unit of retrieval is a record and 
all the records within a given file have a common 
` Structure, consist of a common set of data elements, 
and have a common set of access points. The protocol 
aims to be sufficiently general to apply to a wide 
variety of information sources, from bibliographic to 
numeric to full text databases. 

The standard includes mechanisms to exploit three 
query types; these include ISO 8777 (Common 
Command Language) and RPN Query (reverse Polish 
notation). The RPN provides for Boolean operators, 
relational operators, and truncation, but not for proximity 
operators. The semantics of an RPN query are defined by 
an Attribute set. One attribute set, BIB-1, has been 
registered within the standard and provides for the 
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expression of queries to bibliographic databases. Cther 
attribute sets will be defined, e.g. for full text searching. 

There are differences currently between the two 
standards. Z39.50 has two optional services - Rescurce 
contro] and Access control. The target may exe-cise 
access control by demanding authentication before 
processing the search. The target can also impose 
resource control by advising the user of operations that 
are taking a long time and allowing the user to decide 
whether or not to continue. 

In addition to allowing a user to query several 
remote systems with a single language, Z39.5)/SR 
also has potential to support interfaces between diffzrent 
modules of the same system, for example to allow an 
ILL or acquisitions system to search a catalogue. It 
should also be able to be used for record transfer 
between a local system and central service. Front- 
ending a less user friendly search system is another 
practical use. The challenge will be to design a 
functionally rich interface for a variety of systeras. 

Criticisms of the standard in its present phese of 
development are that it does not support the process of 
finding relevant sources, takes a simplistic approach to 
delivery — failing to define location, method or quality 
— and is insufficiently well defined in the areas of 
security and accounting !3 3, The standard does not 
yet allow browsing of indexes or thesauri. 

There are a number of SR/Z39.50 implementations 
under development?! ?, The Mercury project at 
Carnegie-Mellon University is aimed at developing a 
virtual library, including supply of images of printed 
text over the network", 

Thinking Machines are developing the Wide Area 
Information Server (WAIS) and have extended the 
739.50 protocol to support relevance feedback to meet 
the searching needs of non-information specialists. 
The user can select one or two documents as relevant 
and ask the database server to provide s.milar 
documents. Typically each document is described by a 
‘headline’. If the heading is of interest then the full 
document can be displayed. WAIS also provices for 
the retrieval of parts of large documents and the retrieval 
of multimedia documents?!, 

The Berkeley Network Information Server Project 
is of interest since it will support access to a range of 
text materials in an academic environment — zourse 
catalogues, dictionaries, events listings and training 
materials. 

The possibilities for Z39.50/SR are tremendous 
although they will take considerable time to realize. 


Common command language (CCL) 

Both ISO and NISO have developed standa-ds for 
common command languages. ISO 8777 is an aporoved 
standard which can be used within the Searzh and 
Retrieve protocol. ISO 8777 took as its startinz point 
the Euronet CCL, developed for the Eurone-DIANE 
host system. The NISO standard, 739.58, has developed 
separately. Both have strong similarities but differ in 
the areas of search qualifiers and proximity operators. 
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Experience so far suggests that it is unlikely that 
Common Command Languages will ever make real 
inroads in the software, Online or CD-ROM markets 
since these thrive on difference. However, it does 
appear likely that within an organization it may be 
possible to standardize on a single user interface for 
internal and external resources by exploiting the Z39.50/ 
SR standard. 


8. Human computer interface 

The user interface is critical to the successful use of tradi- 
tional and electronic information sources. Consistent 
approaches for search and retrieval will provide major 
steps forward but will still leave debates such as menus 
versus commands unresolved. It is impossible to do 
justice to such a vast topic here. However, a few 
standards-related activities are of current interest. 


CD-ROM | 

As CD-ROM sources are in a comparatively early 
state of evolution, the development of a standard user 
interface would be timely. The Library and Information 
Technology Association of the ALA (American Library 
Association) has established a CD-ROM Consistent 
User Interface Committee, CD-CINC 3,4, CD-CINC 
proposes to define the basic functions inherent to 
contemporary search and retrieval software and to 
suggest a consistent set of terms for these functions. 
The NISO CD-ROM Standards Committee is formul- 
ating anew ANSI CD-ROM standard which will allow 
for the ability to retrieve data from any CD-ROM disc 
within any operating system into any user interface. 

Developments like these should simplify the life of 
the user considerably and contain training effort to a 
minimum, allowing the user to concentrate on strategy 
rather than the logistics of the user interface. 
Developments in display technology have supported 
the development of bit-mapped or graphic displays in 
the last few years. Many graphic user interfaces (GUIs) 
have emerged, e.g. XEROX's Viewpoint, Apple 
Macintosh, Next and Microsoft Windows. The X- 
Windows display management standard is now viewed 
as ade facto standard for GUIs on UNIX based systems 
and the emergence of this standard is important for the 
design of consistent user interfaces to information 
applications. The aim, as with other standards, is that 
the kinds of data and commands expected should be 
independent of the hardware used‘. 

One of the most frustrating aspects of computer 
use is the inconsistent functionality of the keyboard. 
An ISO/IEC JTCI working group is working on a 
series of standards that will address keyboard layouts, 
dialogue interaction and symbols?. 


9. Quality of scientific and technical information 

Before concluding this paper, 1 wish to look at two 
other aspects of standards activity. The evolution of 
the electronic information world could change the 
traditional approaches to scientific and technology 
publishing considerably. Indeed, the future existence 
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of publishers continues to be questioned, since the 
increased probability of electronic publishing will make 
it possible for researchers to publish via a network 
directly to the reader. This raises the question of 
whether the traditional approaches to guaranteeing the 
quality of publications by editorial control and peer 
review can survive in an electronic environment. There 
are critics of this review process who feel that it suffers 
from bias and conflict of interest and may suppress 
innovation. Equally, there are those who are concerned 
that uncontrolled electronic publication will lead to a 
flood of invalid material being disseminated. These 
issues continue to be hotly debated. 


10. Quality management and quality standards 
The emergence of quality management standards as 
one outcome of the increasing penetration of the quality 
assurance concept is already influencing information 
activity in Europe. BS 5750 in the UK, the ISO 9000 
series of standards (the first standards to be issued as 
guidelines), and EN 9000 in Europe are concerned 
with quality management. 

BS 5750 certification aims to provide a guarantee 
of quality performance or quality products to customers 
through an accreditation programme. This registration 
process ensures that produc: quality is defined, key 
processes are documented, standards for performance 
are set, and valid procedures exist for vetting and 
monitoring adherence to standards. Adopting BS 5750 
is strengthening the role of standards generally in the 
UK, since an organization will often need or want to 
reference other standards as pert of its quality assurance 
process — insisting, for instance, that equipment meets 
a relevant national or international standard. 

Certification is, however, less important than 
managing an organization to ensure quality and many 
businesses and public sector organizations are now 
embracing the TQM (Total Quality Management) 
philosophy. . 

The aim of TOM is to ensure that the needs of the 
customers and the objectives of the organization are 
satisfied in the most efficient and cost-effective way, 
by enlisting the support of employees in a continual 
drive for improvement. There are six basic tasks in any 
quality programme: 


.* defining the customer and the service 


e setting standards or performance measures 
e measuring the process or the service 
e reviewing this performance with the customer 
(internal or external) 
determining the causes of problems and resolving 
them 
e improving the process or service 

Within the science and technical information 
community, quality management philosophy is 
potentially applicable to every aspect of activity, e.g. 
database creation, indexing, cataloguing, publication, 
document supply, enquiry handling and customer 
service. At the macro scale, the performance of an 
organization can be measured. 


Aslib Proceedings, vol.46, no.1 


Standards: their relevance to scientific and technical information 





The Library Association in the UK is drafting a set 
of unambiguous, quantitative standards which will 
define what constitutes a good library service in the 
public sector. Standards for libraries have existed 
within North America for many years. A new initiative, 
EQUIP, is examining current approaches to quality 
management in the provision of Information Services 
in Europe. The programme is supported by GAVEL, 
EUROLUG, and EUSIDIC (European Association of 
Information Services). Phase 1 includes an extensive 
survey ofthe impact of ‘quality’ in information service 
provision and the quantification of quality. Later phases 
will lead to a European Accreditation Scheme. The 
ISO, too, has a sub-committee which is conducting a 
feasibility study into standard performance measures 
in libraries. 

Staff quality is, of course, critical to service quality. 
The development of professional standards for staff of 
all levels involved in information activity will provide 
a firm foundation for training, development and 
certification of levels of competence. Within the UK, 
a major initlative to establish National Vocational 
Qualification standards is now well under way and this 
is being actively supported by organizations 
representing librarians, information scientists and other 
information workers. 

Nationally, it is important that employers, profes- 
sional associations, training organizations (universities, 
colleges and commercial training suppliers) examine: 
e what types of information work exist within the 

country 
e what different levels of skills and competence can 

be identified as essential for good quality work. It is 
equally important that mechanisms exist for 
accrediting people which clearly demonstrate that 
the appropriate skill level has been achieved and 
which reward the individual for skill improvement. 


11. Prioritizing standards activity 

As I have researched for this symposium I have been 
amazed at the breadth and penetration of standards, 
andatmy own ignorance ofthe dimensions of standards 
activity. 

Standards have enormous potential to confuse — 
which standards are important to which activities; 
whatisthe status oftheir development; are the standards 
really applicable to my environment; do products 
based on the standard yet exist. 

Standards are also frustrating. I have noted a number 
of what appear to be competing standards and have 
been concerned at the apparent waste of human effort 
in their maintenance and further development. It also 
seems inevitable that the laborious procedures for 
developing standards on both a national and an 
international scale will mean that standards lag behind 
the necessity to use them and that, in the interim, one 
will need to adopt products and processes that may 
subsequently need to be changed. The promotion and 
marketing of standards often suffers from neglect. 

Standards nevertheless are vital to the docu- 
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mentation, exchange and retrieval of scientific and 

teclinological information. The effort put into their 

development and use should be far outweighed by the 

economics thus gained. 
In determining standards priorities within an 

organization the following questions are germane: 

e what are the organization's objectives? 

e what are the key activities and goals which will 
ensure that the objectives are achieved? 

e which activities can be facilitated by the adoption 
of standards? 

e dorelevant organizational, national or international 
standards exist? 

e what is their status? 

e are they mature enough to be adopted? 

e do they require adaptation to the organizational 
environment? Are they suitable for adaptation? 

e isit essential that they are adopted? 

e which are the most important? 

e is any training or external support required to 
facilitate introduction of the standard? 

e doestheorganization need to influence their further 
development? 

e whatare the mechanisms for exerting influence? 

e are there any activities for which the organization 
should write its own standards? 


These questions are also relevant on a national scale. 
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1. Introduction ; | 

Consultancy in the West is relatively big business. In 
1991, the top 25 European consulting firms earned 
over £2,000 million. In 1992, income earned in the К 
alone by the top 25 topped £1,000 million; and a sur- 
prisingly high number of consultancies were untouched 
by the recession. Some even increased their profitability 
— in some cases quite dramatically (Table 1). 


Table 1. UK Fee Income! 


UK Fee Income Change 


This Last 
1992 1991 period period 
(£m) (£m) % % 


Andersen Consulting 214.6 172.1 24.7 32.3 
Coopers & Lybrand 124.0 136.0 -8.8 -9.3 


Price Waterhouse 99.1 103.0 -3.7 п/а 
PA Consulting 93.9 97.8 -3.9 11.2 
KPMG 81.8 84.5 -3.1 14.8 
Wm Mercer Fraser 70.0 70.0 0.0 n/a 
Touche Ross 58.3 54.5 6.9 -0.9 
P-E International 51.8 55.6 -6.8 4.1 
: Ernst & Young 47.0 47.0 0.0 п/а 
BIS Group 310 330 12.1 n/a 
Towers Perrin 280 293 -44 0.0 
Capita 2710 247 9.3 23.5 
CMG 250 25.0 0.0 п/а. 
Arthur D Little 21.6 21.0 28 -19.2 
Hay 18.0 18.0 0.0 п/а 
EDS-Scicon 16.0 15.0 66 ' nma 
Boston Consulting Group 140 137 2.1 8.7 
CSL 14.0 15.0 -6.6 2.1 
Sema Group 12.8 7.8 64.1 n/a 
Booz Allen & Hamilton 12.5 12.5 0.0 8.6 
Stoy Hayward 8.0 8.0 00  -183 
Oasis 7.0 9.1 -23.0 89.6 
WS Atkins 6.0 532 15.3 62.5 
BDO Consulting 6.0 6.0 0.0 47 
Doctus 5.5 5.0 10.0 -6.9 





Many of the big names have also appeared to have 


done weil out of the formation of the European single 
market as the national infrastructures adapt, and inform- 
ation exchange patterns change. The biggest players 
were earning about only 25% of their European income 
in the UK, while one was almost half and half (Table 2). 
` What is particularly interesting to note is that, of 
the income generated in 1992, 46.4% was from work 
in Information Technology; 14.5% from Financial 
Management, and under 40% from all the other 
consulting areas (Table 3). 


Table 2. European Fee Income! 


1992 199] 
(£m) (£m) 
Andersen Consulting 626.9 526.0 
Coopers & Lybrand 309.8 280.0 
KPMG 262.4 274.6 
PA Consulting 107.1 112.0 
CMG 89.0 79.0 
P-E International 63.9 66.0 
Arthur D Little 53.8 42.4 
Towers Perrin 43,5 44.5 
BIS Group 42.0 37.0 
Stoy Hayward 42.0 27.9 
Sema Group 39.5 29.0 
CSL 14.0 15.0 
Oasis 10.7 10,2 
Doctus 9.0 8.0 
Braxton 6.0 6.0 
COBA MID 6.0 5.5 
Merchants 5.5 3.4 
Collinson Grant 5.1 4.6 
Nichols Associates 5.1 5.7 
Coverdale Organisation 3.5 3.5 . 
TCA 3.1 2.9 
DBI 2.1 2.0 
Beaufort 2.0 11 
МММ 17 17 
Dent Lee Witte 1.7 1,8 





Table 3. How the consultancy market split in 19921 


Proportion of Fees Value 
of Top 50 Firms (%) (£m) 

Information Technology 46.4 539.5 
Financial Management 14.5 168.6 
Strategy & Organization _ 74 86.0 
Operations Management b ` $2 60.5 
Manufacturing 2.1 24.4 
Technology/ Innovation Management 43 50.0 
Project Management 4.6 53.5 
Marketing & Product Development 2.3 26.7 
Human Resources 5.0 58.1 
Change Management 3.0 34.9 
Other 5.2 60.5 


TOTAL 100.0 1162.7 


The first of these figures is not surprising when one 
considers the huge capital expenditures demanded by 
the purchase of hardware and software and the need to 
optimize running costs and benefits, though it should 
be recognized that much of this income will have 
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been generated from the lucrative business of 
facilities management. Similarly the second figure 
reflects the huge sums of money handled by firancial 
systems and services and the need to regulate and 
control financial transactions and cash flows. (It might 
be added, in passing, that some of the most spec- 
tacular failures have occurred where information 
technology and financial management have come 
together. Not long ago the London Stock Exchange 
TAURUS system for the automated management of 
share dealing was written off at huge expense). The 
world of management consultancy is dominated by a 
few big firms, emphasized by recently created mergers, 
who mainly have their roots in accountancy or 
engineering, and who have evolved to meet the demands 
described above. 

In contrast, information consultancy is very small 
business. The library and information science 
professions are small compared to those involved in 
accountancy or computing. For example, in the UK the 
Institute of Information Scientists has about 2,500 
members compared with some 200,000 in the various 
associations of accountants; 76,000 in the Institute of 
Management and some 35,000 in the British Computer 
Society (which last figure is a very poor reflecticn of 
the total number working in the IT sector). 

But, more importantly, the demand for information 
consultancy is concentrated in those areas where the 
information is relatively hard and well understood — 
notably information for control purposes (i.e. 
management information and largely financial) or for 
marketing purposes (i.e. market surveys, competitor 
intelligence and, again, financial and economic 
information). The more abstract forms of information, 
largely text-based and relatively unstructured are not 
easily seen by management to be areas in which to 
invest. What is the value of such information? 
Consequently, the attitudes of (the consultant’s 
customers) have created neither a strong market pull 
nor a technology push, except perhaps with regard to 
products such as CD-ROM. Even here, many of the 
early products were extremely primitive from the point 
of view of an information specialist, while some of the 
extravagant claims made by suppliers (and some 
management consultants) for such things as hypertext 
and Executive Information Systems have also been 
met with justifiable scepticism. 

Most consultancy has both a technical and a human 
dimension, and while there is a place for the almost 
completely technical consultancy thehumandimension 
cannotonly benotignored, butisbecoming increasingly 
important. Information management (IM) is still a rela- 
tively new and poorly understood concept: meanwhile 
management consultants and information consultants 
are approaching IM in different ways. However, both 
require very similar general skills and expertise. 


2. The nature of consultancy . | 


There are, perhaps, a number of définitions of what 
constitutes consultancy; but there are two clear 
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Table 4. A hierarchy of consulting purposes? 





Additional goals 8 Improve 
organizational 


effectiveness 


7 Facilitate client 
learning 


6 Build у 
consensus and 

commitment 
5 Assist 
implementation 












Traditional 
purposes 









4 Provide 
recommendations 


3 Conduct 
diagnosis that may 
redefine problem 


2 Provide 
solution to 
given problem 




















1 Provide requested 
information 






statements to be made. First, it is not just about giving 
advice, and second it encompasses some or all of a 
hierarchy of increasingly complex activities as shown 
in Table 4. 


It can be seen from this hierarchy that the growing 
number of information brokers operate at the lowest 
(albeit important) level and possibly within certain 
defined limits in the second and third levels. Advice 
(and again, this can be important and an end in itself) 
stops at level four. Increasingly, though, clients demand 
more than reports, strategic plans, information audits 
or specifications; and ask for assistance in the follow 
through processes of implementation and beyond. This 
requires a degree of commitment that only the full time 
professional consultant can offer. 

By professional, I mean not only somebody who 
earns his or her living from consultancy, but in so 
doing adopts certain attitudes to the work, and to 
continuous personal development; and is also willingly 
subject to professional codes of conduct. 

There are a number of reasons why organizations 
call in a consultant, some of which are: 

a) the organization lacks some particular technical 
expertise (this is increasingly common as 
organizations trim their work force, and hire 
specialists only when needed) 

b) the organization has the skills, but not the time and 
so buys in an extra pair of hands 

c) to break a deadlock between opposing factions, 
essentially acting as referee; or to get a second 
opinion on a particular problem. 

Whatever the reason, the ultimate objective must be to 

contribute to organizational performance and so, 

referring again to the hierarchy above, it is clear that 
the consultant must become deeply involved in the 
culture and mechanisms of the organization and be 
prepared to adopt an ‘interventionist’ role. But the 
consultant has no executive power and must act purely 
as a catalyst for change. The first key to potential 
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success is in the establishment of good contractual and 
personal relationships with the client. The second key 
is his impartiality. By definition, the consultant is an 
outsider who has no personal stake in the organization 
as a whole, or any part of it, or any of its suppliers. He 
is engaged under contract to undertake specified tasks 
in a specified time scale; to deliver what has been 
agreed; and to leave the organization (in better shape) 
at the end of that contract. 


3. Key aspects of consultancy 

As was stated earlier, consultancy has both a technical 
and a human component, though these are closely 
linked. It is assumed here that the technical component 
of information work need not be discussed. In addition 
to theneed for the interpersonal skills in communication 
via the spoken and written word, and in training, and 
the necessary ability to relate easily to people, there are 
three areas of particular importance. 


3.1 The consultant-client relationship 
The first task that the consultant has is to review and 
establish the real problem. Often, the client has a clear 
appreciation of the problem and is able to define most 
of the parameters. Equally often, the client has not 
identified the problem or more commonly has mistaken 
the symptom for the complaint. If the consultant and 
the client cannot agree on what needs to be done, then 
there is no point in continuing further. Sometimes, 
there is the middle road where both sides agree to work 
together ona phased project starting with identification 
of problems (or, equally, of opportunities). The second 
task is to establish the role of the consultant; together 
with any constraints or limits. Broadly speaking the 
consultant can adopt either, or both, a ‘resource role’ 
aimed at suggesting ‘what’ to change or a ‘process 
role’ aimed at suggesting ‘how’ to change. 

In the longer contract, the consultant can become 


closely acquainted with the organization's personnel 
and itis important to distinguish (if necessary) between 
the person who engaged the client, the person who is 
most likely (for one reason or another) to implement 
the recommendations, and the final arbiter. The 
consultant must ensure that he or she can talk direct to 
the highest appropriate level and must be able to put 
into perspective any friendship that might develop at 
the personal level. It has to be remembered that it is the 
organization that is the client while it is the individuals 
who are the instruments of change. 

In short, the job of the consultant is to impart what 
he can and to leave as soon as the client can continue 
on his own. Table 5 illustrates the complementary 
nature of the relationship. 


3.2 Organizational culture 
In addition to the obvious factors of nationality, 
profession, sex and age, every organization has its own 
culture, and the consultant must be sensitive to the 
differences and their implications. How an organization 
behaves and is structured will depend on its objectives 
and on the people in control; and on how it relates to 
other organizations. The most common cultures ( ina 
highly generalized way ) are the 
Bureaucracy — with a hierarchical structure 
Team — with a matrix structure 
Entrepreneur — with a spider's web structure 
But organizations are made up of individuals; the 
complex interplay between each; and between the 
individuals and the organization both singly and in 
groups (which are not necessarily only functional 
groups). Some of the factors that go to make up an 
organizational culture include: 
a) its mission and image 
b) relative importance of seniority and authority 
c) relative importance of different positions and 
functions 


Table 5. Description of the consultant's role on a directive and non-directive continuum? 


MULTIPLE ROLES OF THE CONSULTANT 





Collaborator 
Process Fact Alternative in problem Trainer/ Technical 
Reflector specialist finder identifier solving educator expert Advocate 
CLIENT 
LEVEL OF CONSULTANT ACTIVITY IN PROBLEM SOLVING 
Non-directive Directive 
Observes Identifies Proposes 
problem- alternatives Provides guidelines, 
| solving andresources Offers Trains the information persuades, 
Raises processes and Gathers for client alternatives client and and suggestions ог directs 
questions raises issues data and and helps and partici- designs for policy in the problem- 
for mirroring stimulates assess pates in learning or practice solving 
reflection feedback thinking consequences decisions decisions process 


experiences 
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d) treatment of staff and working conditions 

e) role of women 

f) selection criteria and career paths 

g) work organization and discipline 

h) management and leadership style 

i) decision-making processes 

j) circulation and sharing of information; commun- 
ication patterns. 

Recommendations made by the consultant may have 

to take all, and more, of these factors into account; 

because to be effective those recommendations must, 

at some critical point, be seen by key members of the 

organization to be sensible, acceptable and — above all 

— capable of being implemented. 


3.3 The change process 

Itis sometimes said that the only permanent feature of 
our lives today is change; and this must be the raison 
d'étre of the consultant: to assist in the planning and 
implementation of change. Not for the sake of change 
per se, but because it is in the nature of healthy 
organizations (as with all living things) to change and 
to adapt to environmental pressures. Any or all of the 
following are subject to the process: 

a) basic set-up of the organization 

b) tasks and activities 

с) technology used 

d) management structures and processes 

€) organizational culture 

f) staffing levels and deployment 

g) organizational performance 

h) organizational image. 

Some of these appear relatively simple. For example, 
a modest application of flow charting and work 
measurement can result in the redesign ofa work area 
and work flows within that area; but even the simplest 
change may have repercussions. 

It may bea cliché to say that the world, particularly 
the industrial nations, is becoming increasingly 
complex, but the fact is that we live in it and must deal 
with the complexity. 

Consequently, management techniques and 
approaches have become correspondingly complex 
and more tied in to the behavioural sciences. In 
particular, the as yet unassimilated impact of infor- 
mation technology and the market economy are forcing 
many managers to effect fundamental appraisals of 
every aspect oftheir organizations. For example, Total 
Quality Management switches the emphasis from the 
input perspective to the customer-oriented output 
perspective and in the process deep and far-reaching 
adjustments can be made. But others argue that TOM 
does not go far enough because the quality approach 
tends to be incremental — doing a little bit better what 
you are doing at the moment (which may, in any case, 
be wrong or misguided). Such people argue that an 
answer can be found in Business Process Reengineering 
which they say is the ‘management of change’. All the 
previous discussion of change has been spatial in 
nature but the temporal dimension also controls the 
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stability of the organization, and must not be 
overlooked. Table 6 gives an indication of the problem. 


Table 6. Time span and level of difficulty involved for 
various levels of change? 


| ORGANIZATIONAL OR GROUP BEHAVIOUR (4) 


INDIVIDUAL BEHAVIOUR (3) 


Difficulty 
involved 


| 
(low) 


(short) (long) 
— Time involved ———— 





4. Stimulation of a consultancy profession 
It is clearly sensible for a developing country to 
stimulate its own consultancy potential for at least 
three reasons: 
a) to reduce expenditure on foreign consultants (many 
of whom are very anxious to open up new markets) 
b) to make effective use ofthatlocal knowledge which 
only the indigenous professional can have 
с) to promote national self-reliance. 
In practice, the answer may be found in a fruitful 
collaboration with foreign consultants so that skills 
and knowledge are optimally deployed. One excellent 
reason for collaboration is to break out of the vicious 
circle of not being able to undertake consultancy 
without experience, and not being able to gain 
experience by an inability to get started. Experience is 
possibly the most important ingredient of the 
consultant’s portfolio, providing he is always learning 
from that experience. Technical skills can be taught, 
but personal skills can only be cultivated. 

The objectives of training new consultants are 
usually considered to be to ensure that the consultant 
has the necessary ability and confidence to carry out 
assignments by enhancement of his 
a) analytical and creative skills 
b) collaborative skills and ability to implement change 
с) technical proficiency in particular fields ordisciplines 
d) ability to work independently and under pressure. 
Some of these objectives can be met by the traditional 
curricula of library and information schools; others 
may need to be addressed elsewhere. It is not possible 
to make consultants any more that it is possible to 
make airline pilots or brain surgeons. Recognition of 
the necessary skills and temperament is necessary, 
followed by appropriate encouragement and training; 
but finally consultancy seems to be an entrepreneurial 
activity which develops well in a market economy. 
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1, Introduction 

China is moving towards a new form of economic 
structure — one based upon a market economy with all 
of the problems associated with that mode of 
organization. The stresses associated with major 
economic change are being felt already and it is 
questionable whether the society can derive the full 
benefits of a market economy mode without further 
social and political change. It is to be hoped, at least, 
that the market is not allowed to negate totally the 
concept of the general good, and that some effective 
control over the harmful effects of a crude concept of 
‘market forces’ can be maintained. 

However, given that the changes are in train, it is 
clear that the emerging businesses will need access to 
external, environmental information on a previously 
unparalleled scale and that this will have significant 
consequences for the development of China’s 
information policy. The whole thrust of previous 
information policy has been towards support for science 
and indusiry, particularly the defense-related industries, 
and ISTIC has played a major role in the development 
of those services. The future thrust, however, will be 
towards business and those industries that are related 
to the economic future of the country — this requires the 
development of policy related to business and 
commercial information services, and the concept ofa 
‘socialist market economy’ will undoubtedly produce 
political pressures for these services to be provided by 
commercial organizations, rather than by public 
enterprises such as ISTIC. Indeed, some information 
brokers already exist in China and it is likely that new 
modes of information service provision will arise, 
particularly in areas such as Quandong and Fujian, 
which are the focus of intensive industrial expansion. 

An information policy for the future, then, must 
give some attention to how that policy is to be 
implemented and, as regards business information 
services, that attention should focus upon how 
information needs in business are to be forecast and 
identified, so that relevant services may be developed, 
which are perceived to be useful to the new 
entrepreneurs in China and that will support the 
development of a healthy market economy. 

This paper, as a consequence, is concerned with 
the tools that may be used by policy makers and 
service planners to identify information needs in 


advance of the development of services, so that 
resources may be targeted upon those business sectors 
and information services that are most likely to find a 
ready market and a receptive audience. I propose the 
adoption of a coherent approach to service planning, 
based on four analytical tools: sectoral analysis, Porter’s 
five-forces analysis, the value chain, and information 
needs analysis. 

Much of the paper is based upon work carried out 
in the Department of Information Studies at the 
University of Sheffield in recent years and into other 
aspects of information management in business both 
in the UK and in Portugal (e.g. Roberts & Clifford, 
1986; Roberts & Clarke, 1988; White & Wilson, 1988; 
Codington & Wilson, forthcoming). 


2. Sectoral analysis of business and industry 

All businesses need information, and much information 
is acquired by relatively informal means — through 
conversation on the golf course or in the bar, attendance 
at trade fares, the daily press (especially the financial 
newspapers), trade publications, radio and television 
and so on. However, the more export-oriented the 
business, the larger and more structured the 
organization, the more technology-driven the 
production process, etc. the more likely it is that the 
business will need access to information on its markets, 
its competitors, and the prevailing economic and 
political climate—all that information generally termed 
‘environmental information’. In other words, the more 
information intensive a company may be. These features 
vary from sector to sector of industry: e.g. the food 
industry in many countries is relatively low-technology- 
based, whereas the electronics industry (by definition) 
is high-technology; the leather industry is typified by 
the existence of many small manufacturers, whereas 
the pharmaceutical industry has a small number of 
multi-national companies; the export-led businesses 
are those that have ready overseas markets and limited 
domestic markets — the examples could be extended 
almost ad infinitum. 

Consequently, any information service seeking to 
provide information to business in general needs to 
сапу out a sectoral analysis of the companies in its 
region to identify those that are likely to have the 
greatest need for environmental information, 
competitive intelligence, and export information. 
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Indeed, such analyses need to be undertaken as a 
matter of information policy, so that government may 
determine which information services must be 
subsidized by, or provided by, the state, rather than 
being left to the vagaries of entrepreneurial 
development. 

The sectoral analysis should also take into account 
not only the individual companies that may benefit 
from such services, but also the groups of companies 
that may be too small individually, but which may 
already co-operate through trade associations, regional 
organizations, and chambers of commerce. National 
and regional government development policy must 
also be taken into account. For example, in many 
Western European countries the Chambers of 
Commerce play a significant role in information 
provision and may be seen as independent of other 
information providers. Thus, a kind of ‘information 
isolation’ may develop, in which the other information 
providers are not recognized as relevant to the work of 
Chambers. Clearly, this is likely to be less effective 
than seeking to work with other agencies, to fill gaps in 
provision and to reach those companies that are not 
members of the Chamber. 


Sectoral analysis of industry: 
the Portuguese shoe industry 


e employs about 40,000 people, but in more than 
1,000 factories — an average size of fewer than 4) 
people to a factory 

e is largely owned by single owners or families 

e in 1989 exported more than 60,000,000 pairs of 
shoes. 


e What kind of information does an industry like 
this need, and how best can it be provided? 





A concrete illustration ofthe idea of sectoral analysis 
can be taken from Portugal: the shoe and leather 
industry is mainly concentrated in the North-West of 
the country and, therefore, companies elsewhere may 
have difficulty in maintaining the links that provide 
intelligence about what is happening in the industry. 
The sector is also typified by small companies — the 
sector employees about 40,000 people in more that 
1,000 companies, an average of fewer than 40 people 
in a factory. Those factories tend to be owned by 
individuals or families and in 1989 the industry as a 
whole exported 60,000,000 pairs of shoes — clearly, 
therefore, an important economic sector. 

This sectoral analysis raises a number of questions 
about information provision: how are the links among 
companies developed so that collaborative action on 
important areas of information provision may develop? 
How can information services be developed that are 

. economically acceptable to small producers? How, in 
particular, can small companies be helped to identify 
export markets for their products? These questions 
demonstrate how inextricably linked are information 
policy, economic policy, and the formal and informal 
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interactions of private companies. In other words, in 
this instance, as in so many others, the answers cannot 
be left to the market. 


Porter's five forces 

Sectoral analysis is the broadest level of analysis, 
where we look at the composition of industry in general: 
turning to the company level, we can look at Porter’s 
(1985) concept of five forces acting upon a business, 
which will enable us to determine which kinds of 
information are most likely to be of relevance. 

The five forces are: the immediate competition for 
the company, in its established market, who are a 
constant threat to market share, unless there are formal 
or informal agreements on market sharing — often 
illegal in developed countries — which produce 
oligopolies in industry sectors; suppliers, who may use 
what they believe to be a monopoly position to raise 
prices, thereby constituting a threat to the maintenance 
of profit margins; customers, who may seek to drive 
down prices by threatening to take their business 
elsewhere; producers of substitute products, which 
may take away market share by creating markets for 
the substitutes — a famous example being the creation 
of the market for quartz watches, which almost killed 
off the Swiss watch industry; and new entrants to the 
industry — more or less likely depending upon the costs 
of entry. 

Different companies, and even different industry 
groups, may be subject to these forces to different 
degrees and this offers the information service provider 
a basis for targeting different services to different 
companies. For example, in sectors where the 
competition within the industry is vigorous, a 
competitor analysis service, based on the full-text 
databases, financial analysis newsletters, and stock- 
market reports may find a ready market, whereas in 
another industry where substitute products are seen as 
a threat, a new products newsletter may find buyers. 









Five forces analysis 


New Entrar.ts 


to industry 





Suppliers —— -——— Customers 


Substitutes 





Clearly, the use of five-force analysis demands a 
careful survey of the industries and companies to be 
targeted, and is not to be undertaken without cost. 
However, it does provide a sound basis for the 
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development of market research instruments for the 
information provider, rather than that provider relying 
upon rule of thumb or the extension of established 
practice. 


4. Value chain analysis 
The ‘value chain’ is another concept used by Porter 
(1985), and we can adopt it to discover which parts of 
an organization are most likely to benefit from 
information provision — which parts, in other words, 
are likely to be information-intensive in the nature of 
their operations. According to Porter: 
The value chain disaggregates a firm into its 
strategically relevant activities in order to 
understand the behaviour of costs and the 
existing and potential sources of different- 
iation. A firm gains competitive advantage by 
performing these strategically important 
activities more cheaply or better than its 
competitors Porter, 1985; 33-34 
and the value chain has ‘five generic categories of 
primary activities’: inbound logistics, operations, 
outbound logistics, marketing and sales, and service. 
In the diagram below these are modified, but the 
relationship will still be visible. 

If these stages in the value chain have relevance for 
competitive advantage, it is clear that information 
services directed towards improving their efficiency 
or effectiveness will have strategic significance for the 
organization. 


Е toe) oe) инн) tne 


Drug 
design Fashion textiles 
Personal J-I-T supply 
computers 
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Fast food 
' outlets 
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Value chain analysis 








Hunsicker (1989) (from whose paper the diagram 
has been adapted) has suggested that different parts of 
the value chain are more important than others for 
difference kinds of companies. For example, we can 
readily see that leading-edge pharmaceutical companies 
rely heavily upon their research and development 
departments for the developments that bring profit, 
whereas the perfume industry relies heavily upon 
marketing. We can take this idea forward by suggesting 
thatthe areas ofthe value chain that are most significant 
should be the prime focus for information systems and 
services. In some cases the critical information will be 
that generated inside the organization, in others, 
however, external, environmental information is likely 
to play an important part — the pharmaceutical industry 
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is а case in point — and in yet others a mixturs of 
internal systems and external information provision 
will be necessary. 

As in the case of five-forces analysis, although it 
may be possible to identify the critical areas o£ an 
organization's value chain on an a priori basis, it is 
likely thatthe information provider (whether an internal 
information unit or an external, commercial provider) 
will need to undertake surveys to identify the cri-ical 
sector or sectors. 


5. Information needs research 
Finally, we can consider the role of information needs 
research. In fact, we can view the preceding ‘tools’ as 
part ofthe total process of information needs research, 
but the term is usually used to identify those studies 
that involve sample surveys or case studies of individual 
information-seeking behaviour, Here, we come tc the 
‘micro’ level of needs analysis — the identificatioa of 
individuals and roles around which information needs 
of different kinds cluster. We have to realize that, even 
with the information intensive areas. of information 
intensive companies, we still have to identify the 
target individuals or sections, and market our services 
to those individuals if we are to be successful in our 
business of information service provision. What, then, 
can information needs research tell us that may be 
helpful in designing and delivering services? 
Information need is not so much an innate human 
need as a response to a situation in which the basis for 
decision-making is lacking (Wilson, 1981). The 
situations in which the need for information is likely to 
arise in business and industry are diverse, anc as 
Mintzberg (1987) reminds us, strategy may emerge 
from the decisions taken in response to the 
circumstances of the moment as much, if not more, as 
itis likely to be developed in formal strategic plann ng. 
Some writers have assumed that busiress 
information needs can be neatly categorized urder 
headings such as ‘strategic planning’, ‘management 
control’, and ‘operational control’ (e.g. Keen & Mor-on, 
1978). However, these categorizations are much fuzzier 
in the real world than they are in theory, and the 
neatness of the classification disappears when опе 
probes into the real information needs of busiress - 
people. If, instead of using theoretical categories, one 
uses organizational divisions such as those represer ted 
in the value chain (and in supporting, organization- 
wide functions such as personnel management and 
financial control), it becomes evident that different 
individuals, pursuing different kinds of tasks within 
functional divisions, overlap quite significantly in taeir 
needs for information. For example, in a Sheff eld 
study (White, 1986) it was found that both financial 
and marketing information were regarded as ‘important’ 
or ‘very important’ by people in both strategic planring . 
and operational control roles. E . 
The fact that the potential market for information is 
so general in firms suggests that, from the poin of 
view of the information provider, we need to exarrine 
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why so little organizational significance is attached to 
an aspect of business success that individuals find 
important. We found that people in all kinds of roles 
needed information, and that, although some clear 
relationships between role and information type 
appeared, in general the picture was much more diffuse 
than proponents of particular modes had suggested. 
For example, in interviews with nineteen senior 

executives in a major food-processing company, 
covering all sectors of the firm, the following five 
. categories (out of 18 presented) were regarded as ‘very 
important’ by ten or more of the respondents: 

ө company general accounts 

e stock levels in the company 

e delivery performance 

e company targets, and 

e new product information. 

Note that only one ofthese types (new product inform- 
ation) has connotations of access to external sources. 

A similar picture was presented in another 

investigation at Sheffield into the use of external 
information sources in business (Roberts & Clifford, 
1986). The authors noted that: ‘When presented with a 
list of types... of information, respondents indicated a 
spread of demand over all the categories provided'. 


The table below shows the results: 


Information/data No. of 
relating to... 
Marketing 
Products 
Exporting 
Finance 
Competitors 


respondents 


Patents 

Production 
Health/Safety 
Personnel 

Suppliers 

European Community 
Standards 

Premises 





There are some interesting similarities, as well as 
differences, in the results of a survey of industries in 
Northern Portugal (Correia etal., 1991) which covered 
188 firms. The table below shows the result of 
presenting a list of information types to respondents. 
The table shows two things: the results you get 
depend upon how you categorize information types, 
and the importance of local circumstances. Portugal is 
arelatively recententrant into the European Community 
and, as a result, more business incentives are now 
available under the various programmes of the 
Portuguese government than ever before. Naturally, 
businessmen are interested in those incentives, how 
they may apply for them, whether they are eligible, 
and so forth. The incentives also have strategic 
significance, not only for the individual business, but 
for Portugal as a whole: it is scarcely surprising, 
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therefore, that information under this heading should 
be at the top of the list. Otherwise, you will see that 
information on markets and products rank just about as 
highly in Portugal as they do in the UK. 


BUSINESS INFORMATION NEEDS - PORTUGAL 


Information types % respondents 


Incentives 

Legal issues 
Markets 
Products 
General business 
Sectoral studies 
EEC 

Science & tech. 
Statistics 
Fairs/exhibitions 
Macroeconomics 
Norms/standards 
Customs/excise 


This is a brief review of some limited aspects of 
information needs analysis, but even this shows that 
business people can readily understand the idea of 
information types and their relevance to the business. 
They also recognize the idea of the strategic significance 
of information and that external, as well internal, 
information may be of value to the firm. We can also 
see that local or national circumstances may dictate (a) 
how information types are categorized, and (b) the 
perceived importance of different types of information. 
Clearly, for the Chinese situation, studies of information 
need should be undertaken in China, preferably using 
methodologies that allow comparison with investi- 
gations elsewhere. Where such studies are undertaken, 
care must be taken to collect information on examples 
of problems and the information used to solve those 
problems, to fill out the basic framework of information 


types. 


6. Conclusion 

The thrust of this paper is that it is not necessary to go 
blindly into the business of information provision for 
business and industry. Information workers are all too 
ready to try to do everything for everyone, and the 
resources to do this satisfactorily rarely exist. We have 
to approach the business of information in a businesslike 
manner. I have tried to demonstrate that the analytical 
tools exist to ensure that the right services are delivered 
to the right people in the most receptive organizations. 
If we can achieve this, we can go on to achieve more, 
but, without some initial success, long-term 
performance is likely to be poor. 

The key to the success of information policy, 
is effective performance of the activities defined by 
the policy. Effective performance is only achievable if 
we lay down a secure basis for the development of 
services and the analytical tools presented here offer 
the service. provider a way of ensuring that the 
basis is secure. 
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Abstract 


The publication in 1993 ofthe 1991 Census on CD-ROM by Chadwyck-Healey makes the data much more accessible, 

cheaper and easier to use. The data is available in a variety of different formats to suit different user groups. A number 

of software products are available to assist the user in the exploitation of the Census data. The Census Data on CD- 

ROM is particularly likely to be important in public and academic libraries. As with the development of any CD-ROM 
- based service there are a number of strategic and practical day-to-day issues that will need to be addressed. 


1. Introduction 

In August 1993, the 1991 census data were published 
on CD-ROM by Chadwyck-Healey. The publication 
of these data represents a unique opportunity for 
businesses to acquire and utilize a detailed set of 
demographic data. Statistics are produced for 
approximately 140,000 small areas (averaging 200 
households each) throughout Great Britain. The 
thousands of counts that are produced about the 
population living and working in each small area 
include such classification as age, social class, ethnic 
groups, employment, qualifications, illness, housing 
and car ownership. 

Historically, the census has been invaluable to 
central government, local authorities, utilities and 
large commercial organizations. During the 1990s 
it will be a vital tool in marketing and targeting for 
almost every commercial and public organization. 
The enormous advances in technology since the last 
census will make 1991 data much more accessible, 
cheaper and easier to use. Powerful, low cost PCs 
capable of running GJS and mapping systems will 
allow almost every organization to map census data 
with other databases for improved market comparison 
and analysis. 

Most important of all, the census enumeration 
district boundaries will be capable of being displayed 
as digital maps for the first time. Through mapping and 
GIS software, this will considerably aid the analysis 
and interpretation of data, extending the applications 
of census information. 

Both public and academic libraries have made the 
census data in printed form available to users. The 
availability of census data on CD-ROM has significant 
implications for the kind of service that might be 
offered by libraries, based on these data. 

This article outlines the nature of the data that will 
be available and looks at some of the range of products 
that are being marketed to assist organizations to fully 
exploit the data. It goes on to discuss how libraries 
might make use of these facilities in order to improve 
their service to users. 


2. More about the database 

The data from the 1991 census are available on CD-ROM 
including the complete Small Areas Statistics (SAS) 
and Local Base Statistics (LBS) together with easy-to- 
use mapping and presentation software. The data fom 
the census are continued with vector boundary mapping 
at various levels from national to Enumeration Dist-ict. 

The census contains more information than any 
previous census, providing the most comprehensive 
demographic and socio-economic profile of Great 
Britain ever produced. In addition the data analysis 
facilities are significantly improved so that the data 
can be accessed and used easily. 

SAS tables contain a full range of statistics соуе тр 
basic demographic characteristics, housing, household 
compositions and economic activity. 

There is information on age, sex, marital stetus, 
relationship to head of household, whereabouts on 
census night, usual place of residence, migration county 
of birth, employment status, occupation and housing 
tenure. 


3. Applications 

The 1991 census provides vital information for 
researchers and analysts in many fields including areas 
such as human geography, demographics, economics 
and planning, sociology, ethnology, local governrr ent, 
medical research, retail planning, market research, 
housing management and business strategic develop- 
ment. | 

The ability to combine the user’s own information with 
census data and to present the data in the forn of 
tables, maps and charts, further enhances the valve of 
the data. 

In addition, the 1981 census data was published on 
CD-ROM by Chadwyck-Healey in September 1992 
and comparative use of these two databases may offer 
valuable insights. 

The 1991 Census on CD-ROM, from Chadwyck- 
Healey, is published in a variety of different 
configurations designed to meet the needs of different 
users. For example, packages are available for: 
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Central and local government 

Containing the complete Small Area Statistics (SAS) 
and Local Base Statistics (LBS) datasets and boundary 
maps at Enumeration District (ED) level and above 
and equivalent areas in Scotland, the whole of Great 
Britain or individual counties. 


Health 
Containing the complete SAS and LBS datasets and 
boundary maps at ED level and above and equivalent 


areas in Scotland, aggregated for District and Regional | 


Health Authority areas. Available for the whole of Great 
Britain orindividual Regional Health Authority areas. 


Higher education 

Containing the complete SAS and LBS datasets and 
boundary maps at ED level and above and equivalent 
areas in Scotland. Available for the whole of Great 
Britain. 


Public libraries 

Containing the complete SAS and LBS datasets and 
boundary maps at ED level and above and equivalent 
areas in Scotland. Available for the whole of Great 
Britain or individual counties. 


Schools 

Containing approximately 2,500 selected statistics from 
the SAS and LBS datasets and boundary maps at ED 
level and above and equivalent areas in Scotland. 


4. Software for census data analysis 

A number of software products have become available 
on the marketplace that assist the user to exploit the 
census data. A review of some of the features that these 
products offer demonstrates the range of potential 
analyses that may be performed on the data, and 
associated applications. In addition to the special 
software products described first below, many 
organizations will wish to integrate their Census data 
with other geographical data, through their GIS. 

The SASPAC Census analysis software runs on 
PCs, mid-range servers and mainframe computers. 

If this is not sufficient then you may wish to 
analyse data in other ways. For example MAP 91 from 
Claymore Services Ltd is designed for analysing and 
mapping census data on a PC under Windows, and is 
available to run on standalone PCs or ona PC network. 
Figure 1 summarizes some ofthe features of MAP 91. 

A further option is C91 from Powys County Council 
which is another PC based software product which 
will: 

e produce standard census tables 

e- allow users to design their own tables 

e allow users to build aggregate datasets for new 
areas 

e export census data for use with spreadsheets, 
databases, graphing programs, mapping packages, 
import user's own data to use alongside census data 

~in reports and tables. 
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A special package for use in educational institutions, 
SCAMP-CD, incorporates data from: 
e 1991 census data 
e digitized census area boundaries 
e digitized cartographic data 
e addresses, postcodes and grid references. 
These are structured hierarchically at national, county 
and district level. The data can be displayed and output 
in the form of maps. 

SASPAC from MVA Systematica and the London 
Research Centre is a further product designed for the 
analysisofthe 1991 CensusofPopulationlocal statistics. 


SASPAC enables you to: 

e load 1991 Census Small Area Statistics (SAS), 
Local Base Statistics (LBS), Special Workplace 
Statistics (Sections A&B), 1981 census and similar 
smallarea datasets into compressed files for speedy 
processing 

e select areas by area identifier, values of counts, or 
distance from a point 

e manipulate the statistics to create new variables, 
define you own areas, rank and sort areas 

e print full tables complete with additional text (which 
is not included in the Census Officers’ data files), 
individual counts, and profiles of areas 

e output data in various file formats suitable for other 
specialist software, such as mapping, statistical and 
graphical packages. 

ED-LINE is a digitized version of the authorized 

national census boundaries. Previously, a lack of a 

definitive correlation between census areas and 

postcoded addresses has led organizations to restrict 
market analyses to the 9,000 postal sectors within 

Great Britain. For many purposes this was a crude 

basis of analysis, since each sector comprises some 

2,000 households. Using ED-LINE, the marketeer can 

now achieve a tenfold improvement in targeting. Each 

ED contains an average of less than 200 households. 


Figure 1, Features of MAP 91 


e Reads census data from SASPAC91, C91, Excel, 
Lotus 1-2-3 and other proprietary databases and 
spreadsheets. 


e Interfaces with digitized census area boundaries 
generated by ED-LINE, ED91 or the GRO (for 
Scotland). 


e Maps census area data and also point data (grid- 
referenced) and cartographic data. 


е Optionally maps census data for special EDs. 


e Allows for user aggregation of census areas, in 
particular the merging of restricted/suppressed 
wards or EDs. 


ə Provides simple statistical and spatial functions. 


e Outputs census maps to other packages via the 
Windows clipboard or in graphic formats. 


* Outputs census maps to any printer or plotter 
supported by Windows. 
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The transition to the tighter ED focus has been 
encouraged by the issue by the OPCS of a definitive 
directory matching postcodes with Census Enumeration 
Districts. 

ED-LINE allows the mapping of data from existing 
customer and market databases with census information 
analysed down to the ED level. Graphical displays of 
this information provide a meaningful and detailed 
correlation of customer and market profiles. 

The product ED-LINE*MARKETEER is an 
alternative which provides simplified boundaries for 
PC-based national or regional marketing applications. 

Many organizations will seek to integrate census 
data into their GIS. They may use one of the wide 
range of GIS that are available on the marketplace. 
Figure 2 lists some of the key products. GIS software 
runs on a variety of platforms, and most organizations 
will customize the software to suit their own 
applications. Usually, external data, such as that from 
the census database will be interfaced with 
organizational data relating to the geographical spread 
of markets for the organization's products or services. 


Figure 2. Some GIS software products 


PAFEC PAFECLTD Workstations, 
Mainframes, Networked 
PCs and Terminals 
NAPIT Encompass PC 
Systems Ltd 
IDRISI Graduate School PC 
of Geography, 
Clark University 
ATLAS GIS Adopt Scientific PC, Macintosh 
Micro Systems Ltd 
STE — | cacti 
ARC/INFO Environmental PC, Workstations, 
Systems Research Minicomputers and 
Institute Inc Mainframes 
ALLIANCE | ICARE International 
Map Info Geographical Data PC, Workstations, 
Capture Ltd Minicomputers and 
Mainframes 
GEO/ PAX Technology PC, Macintosh 
NAVIGATOR 
Map Graphics | Remote Sensing Macintosh 
Services Ltd 
GEOPIN2/ Pinpoint PC 
GEOMAPS Analysis Ltd 
Natural Origin IT . Mainframe, 
' | Geographic Redhill Ltd Minicomputer, PC 
ARGIS Unisys Mainframe, 
Minicomputer, PC 
Lifestyle NDL International 
Mapping 


5. Using the 1991 Census on CD-ROM in libraries 
Academic and public libraries that already have some 
experience of the use of CD-ROMs are most likely to 
find it easy to integrate a further CD-ROM disc 
containing the census data into their activities. In these 
contexts the library may mount this disc as one of 
several on a juke box linked into a network, so that a 
number of users can exploit the facilities at the same 
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time. Libraries for which this isa first venture into CD- 

ROM are more likely to mount the product in an stand- 

alone mode. As with the development of any CD- 

ROM based service there are a number of strategic and 

practical day-to-day issues that will need to be 

addressed. The issues can be grouped into the following 
categories: 

1. The terms ofuse and nature ofuse of the data on the 
CD-ROM - in this instance restrictions on the 
downloading of data may be important. Users may 
wish to download data to manipulate on their own 
PC, possible with the assistance of a spreadsheet 
package or one of the products discussed earlier. 
The extent to which the library may choose to make 
some of the special purpose software described 
earlier available in a library or organizational 
network may be an issue that needs to be debated. 
Also in organizations where users have remote 
access to CD-ROMs over a network, the library 
manager may need to consider what control can and 
need be exercised over the use of the data. 

2. Resources — any CD-ROM based information 
service costs money to maintain. The source of 
resources, whether it be by charging the users or 
drawing resources from some other budget, must be 
identified. 

3. Impact on staff — the introduction of CD-ROM 
offers an ideal opportunity for the end-user to 
perform independent searching of the database, and 
this will be particularly valuable with this database. 
A limited number of specialist users are likely to 
become particularly expert in using this database, 
but staff may need to assist and train new users and 
to do this will need some appreciation of the rather 
specialist functions of this database. 

4. Housekeeping — all CD-ROMs need to be acquired, 
catalogued, and possibly issued. Some means of 
control and issue is necessary in order to avoid discs 
being mislaid. 

5. Other services — the provision of this CD-ROM 
based service may have implications for the user of 
the census data in print form and may lead to users 
requesting access to other data on CD-ROM. 

6. Integration - CD-ROM in most libraries will be 
only one ofa number of computer based information 
services provided by a library for its users. It is 
important that CD-ROM can be integrated into the 
wider environment, and that the user can access 
other institutional databases and software products 
via the same workstation. 


6. Conclusions 

The availability of the 7991 Census on CD-ROM 
poses a challenge for libraries. Many academic and 
public libraries have made census data in printed form 
available to users. The availability of the data on CD- 
ROM facilitates much more imaginative and flexible 
application of the data. It is important that libraries 
play an appropriate role in making this information 
available to users. 
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An evaluation of InfoMapper™ software 
at Trainload Coal’ 


Kirsten Barclay and Charles Oppenheim * 


Department of Information Science, University of Strathclyde, 26 Richmond Street, Glasgow G1 1XH 


Abstract 

A number of methodologies exist for managing information as an asset. Horton's information entity modeling 
technique, entitled Information Resource Mapping, is amongst the best known. He also developed a softwcre, 
InfoMapper™, which is based on the principles of the information mapping methodology. 

Hitherto, no publications have appeared describing the application of InfoMapper'" in a commercial 
organisation. This paper describes experiments to test the software and the underlying methodology to assess th eir 
applicability in commercial environments. Trainload Coal, part of the British Railways Board business, Trainload 
Freight, agreed to co-operate in this test to evaluate the software. 

The paper concludes that the software is too slow, and too US biased for general applicability. The usefulness 
and relevance of InfoMapper™ are limited. However, the underlying ideas were ones which Trainload Coal felt 
they could adopt. It is apparent then, that information resource methodologies do have potential. However, before 
a software such as InfoMapper™ can be used extensively, users must test it in a wide range of environment: to 


assess its usefulness and suitability, and some of the problems with the software must be eliminated. 


Introduction 

Information and organizations 

Itisa truism that the collection, storage, manipulation 
and retrieval of information are much easier due to 
improved hardware and software which are now 
readily available'. However, many of those 
information handling technologies are leading 
organizations into a ‘quagmire of information 
overload", At the same time, as Orna? notes, there is 
often more readiness to invest money in expensive 
IT, mainly because it is seen as keeping up with the 
times, than dedicating much time and thought to 
consideration of planning aspects, such as: 

What information does the organization need 

to achieve corporate objectives? 

How do people need to use information to do 

their work properly? 

How should information flow inside the 

organization and between the organization and 

the outside world? 

What makes up the totality ofthe organization's 

information resources? 

If these issues are ignored, problems can develop 
within an organization. As a result, many organiz- 
ations do not fully exploit their information resources. 

It is another truism that management often sees 
information services as dispensable when they have 
to make savings. Although the value of information 
has been recognized, it is still not perceived as being 
as important as other resources. The onus is therefore 
on information professionals to ensure that 
information resources are seen as important and are 
controlled and managed in such a way to maximise 
their potential value. 


The role of information management 

Lewis defined information management as: ‘the 
integrative management of a broad range of 
information/data/libraries, human, financial and 
technological resources, for the satisfaction of vser 
needs, in pursuit of improved effectiveness, increased 
profits, better services’. Information managers have 
long realized that they need to be aware of the 
importance of supplying all types of informatior. to 
the company and strive to promote the place of 
information resources at the board level’. ‘Information 
management’ has found its way into the vocabulary 
of top management. However, whether they filly 
understand the concept is another matter. For 
example, Aslib undertook a study of 500 companies? 
to seek information about the nature and role of 
information management within specific types of 
organization. 

The results showed that not one reported that all 
their information resources were included in a 20- 
ordinated approach to information management. 
Similarly depressing results were found in an more 
recent Harris survey?, indicating that a sample of 150 
companies from The Times Top 1000 relies heavily 
on internal financial information in stratezic 
decisions. There is less reliance on competitor 
information, and the respondents reported a very 
limited use of external information about -he 
economic, social and political environment. 


Information Resources Management (IRM) 

Information Resources Management (IRM) i: a 
management activity concerned with informaton 
assets, or the content of information, and !he 
people through whom an organization handles its 


Aslib Proceedings, vol.46, no.2, February 1994. pp.31-42 


An evaluation of InfoMapper™ software at Trainload Coal 





information. Horton and Marchand? described IRM 
‘as a stage in a century long development of 
information management strategies and techniques’. 
They claim this development comprises five stages: 
1, Paperwork Management (19th century — 1950s) 
2. Centralization of data processing (1960s — 
early 1970s) 
3. Information Resources Management (1970s — 
early 1980s) 
4. Competitive Business Intelligence (1980s — 
1990s) 
5. Strategic Information Management (19905+) 


Instead of looking at information as an overhead 
expense, IRM looks at it in the following wavs: 

e it has to be seen as something of fundamental 
value, like money, capital goods, labour and raw 
materials; 

e it is something with measurable characteristics, 
such as: methods of collection, uses, and a life- 
cycle pattern; 

e itis something that can be capitalized or expensed 
and cost accounting techniques can be used to 
control it; 

e itis an input, which can be transformed into useful 
output(s) that is (are) beneficial to achieving 
organizational goals and objectives." 


In the past, the data processing departments of 
organizations were under central control. However, 
with the arrival of personal computers, the idea of 
information as a unified resource, if it had existed 
previously, was frequently lost. However, the 
development and introduction of Local Area 
Networks (LANs) and the need to standardize client/ 
user architectures could return us to central control 
over files. Such infrastructures still have information 
management problems. Shared ownership of 
information can lead to ambiguities regarding who is 
in ‘charge’ of the information. Management must 
resolve such problems ifit wishes to achieve effective 
use ofthe technologies and their inherent information, 

John Diebold popularized IRM, with his 
pronouncement that, ‘the corporations that will excel 
inthe 1980s will be those that manage information as 
a major resource'.! The IRM idea has enjoyed 
increasing popularity since its inception. For example, 
recently Aslib announced the formation of an IRM 
Network so that members interested in IRM can meet 
to develop the ideas further.” 

IRM is a process which 'seeks to harness 
information for the benefit of the organization as a 
whole by exploiting, developing and optimizing 
information resources.” Thus, IRM is the managerial 
link between the corporate information resources 
and the organization's goals and objectives. 

To achieve this link, Horton" believes that every 
organization should apply management disciplines, 
tools'and techniques to the information resource just 
as they apply these disciplines, tools and techniques 
to other organizational resources. By doing this, the 
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information resources of the organization are being 
managed along with the other organizational 
resources. However, one of the difficulties with this 
is that the other resources, e.g. human, financial and 
physical, can be measured or counted, whereas infor- 
mation cannot, oronly can with difficulty. Information 
is intangible and subjective, and its value depends on 
the use to which it will be put. For example, manage- 
ment information is only valuable insofar as it 
contributes to decisions and any action taken. 


Information resource mapping 

The idea of Information Resource Mapping was first 
developed by Best. The idea was subsequently 
popularized by Horton!5. It is the best known, and 
probably most widely used IRM technique. 
Information is intangiole, inexhaustible and 
immeasurable. How can it be analysed and be 
managed? 

The answer mapping g;ves is: information can be 
managed because it can be modelled. A manager 
who models an organization's information resources 
as they would any other organizational resource, will 
be able to put the organizations's information 
resources into perspective by understanding the 
contexts within which they are used. By adopting a 
realistic perspective and taking action to exploit the 
resources fully, the resources will work to the 
organization's advantage. 

Information resource models can focus on the 
elements of an organizaticn's information resources 
and how they interlock. Alternatively, they may 
focus on the flow of information within an 
organization. Either way, mapping provides a unique 
approach to dealing with organizational information 
resources and provides a ‘map’ of all the information 
resources that exist within a particular organization. 
Horton and Burk” recommend a ‘discovery process’ 
based on the information resource entities (IREs) 
used by the organization. IREs are ‘a configuration 
of people, things, energy, information and other 
inputs that have the capacity to create, acquire, provide 
process, store or disseminate information; the entities 
are the information holdings and information handling 
functions, that are, or should be, or could be, managed 
as organizational resources'.!? Horton simplifies this 
by labelling such things as: Sources, Services and 
Systems. Horton and Burk's book? provides further 
details on these issues, as well as on mapping 
techniques, such as the ‘discovery process’. 

The information resources inventory approach 
has been developed further to incorporate the idea of 
a Database of Databases. This idea originated in the 
United States (US). This is because of both the 
Federal and State Governments' long history oftrying 
to establish a ‘Database of Databases’ that could 
serve as a single one-stop service facility, whether it 
was walk-in, telephone or electronic dial-up.?? 

One such example is InfoFind. This system was 
developed in response to the need to begin managing 
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information as a resource within the US state of New 
Jersey. InfoFind is an automated directory of state 
information sources. Not only that, it also serves as 
an inventory of state information, an information 
locator. Users can identify state workers who are 
most knowledgeable about the information sources 
(information custodians). They can also derive the 
properties and characteristics of state information. 
The system is used for all these intended functions.” 


An overview of InfoMapper™ 

Horton has recently launched the InfoMapper™ PC 
software?. This software, it is claimed, makes it 
easy to undertake the preliminary resource inventory. 
Although brief descriptions have appeared”, 
practical experiences of the use of InfoMapper™ 
have not appeared in the literature. This paper 
describes a test of the software in a commercial 
environment. 


What is InfoMapper™? 
InfoMapper™ is a ‘decision-support tool and expert 
system designed to assist information managers and 
other information professionals in planning, 
managing and controlling all of the manual and 
automated information resources used by their 
organizations.’ InfoMapper™ is a PC software that 
allows users to compile an inventory of all their 
manual and automated information sources, services 
and systems, regardless of the media, location, or 
whether in the form of data. Use of InfoMapper™, it 
is claimed, results in a PC database that serves both 
as an information resources management aid and as 
an information locator aid. 

The InfoMapper™ package comes with run-time 
version of Ashton Tate’s dBaselV. It sells for $595 
(approximately £350). InfoMapper™ helps to do two 
things: 

e First, efficiently undertake an information 
inventory, classify and edit records, enter the data 
online, and then produce a variety of reports and 
indexes needed to manage, access, and control all 
manual and automated information resources 
utilized by the organization; 


e Second, undertake a variety of ad hoc analyses of 
subsets of the organization's total information 
base for costing, pricing, valuing, analysing for 
overlaps and duplication, analysing for consoli- 
dation opportunities, and detection of gaps existing 
in information.” 

The InfoMapper™ constructed database will not 
contain any of the actual data of an information 
resource. Each record in the database contains 
information about the information resources entered. 
This is not just conventional data such as subject or 
classification number, but includes various other 
attributes that profile the nature of its information 
content, its uses, its purposes, it beneficiaries and 
other characteristics. 
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InfoMapper™ functions and procedures 
The software is configured so that it can be used in 
five stages. These are summarized below: 
Stage One: Initializing forms and software 
Stage Two: Inventory and data collection 
Stage Three: Online data entry/editing 
Stage Four: Producing reports and analysis 
Stage Five: System management 


Stage Four: Producing reports and file generat on, 
is the key stage from the point of view of the 
Information Manager. InfoMapper™ generztes 
reports which can be used either for production of a 
printed directory of information resources, or 
reviewed on screen. Nine types of report can be 
created: 


1. The Main Entry Report is a summary of a dozen or 
so key data elements about each information resource. 
This report provides the directory user with enough 
information to decide what action he/she may need 
to take next regarding the individual IRE, such as 
calling the responsible contact person and ob:ain 
more information. 


2. The Subjects Index is a subject or keyword 
reference tool. It is based on the major topics repo-ted 
for each IRE. It addresses the subject matter con:ent 
of the actual data, not the broad functional erea 
supported by the resource. 


3. The Functions/Programs Index sorts informa ion 
resources by broad program and function. It -s a 
reference tool for users who want to ascertain which 
IREs support or serve which functions or programs 
within the organization. For example, all resources 
supporting the function ‘Finance’ would be shown in 
alphabetic order regardless'of where these resources 
are established and are operating in the organizat on. 
People can use this index to ask *who is using which 
resources for what purposes and with what resu ts’, 
etc. 


4. The Official Names and Acronyms Index sorts the 
information resources and arranges them in 
alphabetic order by the official name i.e. full neme 
or title and acronym. Users will be able to see at a 
glance all IREs in the database, cross-references to 
their acronyms (if any). Besides listing both the 
IRE official name and popular name or acronym, 
this index also identifies the class to which the IRE 
belongs and its mode of storage (manual, automzted 
or a combination). 


5. The Organizations Index sorts information 
resources according to the organizational hierarchy 
numbering scheme, and sorts the records either for 
all units in the organization table or only those uaits 
that have records. An organization unit is a person or 
position in the organization who has been given a 
classification number. 


6. The Locations Index sorts information resources 
alphabetically by their geographic location It 
supplements the Organizations Index. Instead of 
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listing IREs hierarchically by organizational unit, 
this index lists IREs by geographic location in four 
sort sequences: nation, state or province, city or by 
zip/postal code. 


7. The Products/Services Index sorts information 
resources by the products and services produced; it is 
a kind of catalogue of products produced by the 
resource. 


8. The Hardware/Software Index sorts information 
resources by the hardware manufacturer brand, and 
model or type within brand, being used to run the 
resource, 


9. The Cross-Reference/Administration Reportis not 
normally printed as part of the directory. It serves as 
an appendix. It cross-indexes IRE numbers to names, 
and sorts resources. 


Chapter Six of the InfoMapper™: Project 
Manager's Guide? and Chapter Eleven of the 
InfoMapper™: User Manual give full details of 
these reports. 


Other InfoMapper™ Features 

The software comes with three guidance manuals, 
each of which serve different purposes. The User 
Manual exists to guide the user every step ofthe way 
in getting started and using the software efficiently 
and effectively. It provides ready reference materials 
long after the user has become comfortable and 
familiar with basic operations. The Project Manager's 
Guide steps the user through the inventory process 
and provides more management information. than 
contained in the User Manual. It provides insights 
into how to plan, manage and control the project. 
Finally, the Instructor's Guide ?' describes how to 
educate and train InfoMapper™ users. 

A system of help menus using function keys 
guide the user through the steps of getting started 
with, and operating InfoMapper™. It also contains 
functions for access, security and protection 
safeguards. It has three levels of security. 

However, although Horton claims that the system 
has been beta-tested in 500 sites world-wide”, users’ 
experiences of the system have not appeared in the 
literature. 

The purpose of this study was to assess the use of 
this software in a commercial environment. In the 
Spring of 1992, the Department of Information 
Science at Strathclyde University was approached 
by TrainLoad Coal, a subsidiary of British Rail, to 
undertake information resource mapping in its 
organization. TrainLoad Coal willingly agreed to 
become a test bed site for the software. 


Overview of Trainload Coal 

The British Railways Board comprises a number of 
businesses, of which TrainLoad Freight (TLF) is 
currently the most profitable. TLF, as the name 
suggests, is concerned with the freight business. It is 
a specialized service offered to the major industrial 
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companies, i.e. electricity, coal etc. requiring heavy 
haulage transportation. Consequently, TLF is divided 
into four profit centres: 

1. Trainload Coal 

2. Tainload Petroleum 

`3. Trainload Metals 

4. Trainload Construction 

Within each Profit centre team, there are Area 
Fleet Managers and Fleet Managers. In order to 
support the profit centres, seven additional 
departments were created, offering a wide range of 
specialist skills and services. 

These Departments are as follows: 

1. Traction and Rolling Stock 
. Infrastructure 
. Personnel 
. Finance 
. Total Quality Management 
. Operations 
7. Trainload Business Management 

Currently, TLC state that their ultimate business 
objective is to: *maintain and develop our position as 
the leading force in heavy haul transportation. We 
will achieve this by fulfilling the varied needs of our 
customers, owners and employers safely and to high 
standards' .?? 

TLC is Britain's major coal carrier, operating 
more than 250 trains per day, serving power stations, 
steelworks, cement works and chemical plants. The 
business is concentrated on coal fields and power 
stations throughout the UK, especially in the 
Midlands, Yorkshire, North East, North West, South 
West and Scotland?. Over the pat 25 years, such 
operations have satisfied their customers’ needs by 
being reliable and meeting fluctuating traffic patterns 
and tonnage. 

Since about 1975, British Rail has used new 
technology to aid its operations. The result is a very 
complex technological infrastructure consisting of 
mainframes, stand-alone »ersonal computers, local 
area networks, workstations, opportunities for 
electronic data interchange, electronic mail, the 
widespread usage of databases and spreadsheets etc. 
British Rail owns about 600 LANs, and 10,000 PCs. 
Over 40 database management systems are used in 
TLF alone. 

Such a spread of information systems has lead to 
very complicated flows of data between the various 
systems, from the operational levelto the management 
levels. Computers and information technology are 
now an integral part of the working lives of TLC 
staff. 

The proliferation of information systems 
throughout TLC has led to a plethora of data, i.e. on 
floppy disks, hard disks, paper, computer printouts, 
etc. The value of this data or information is quite 
often questionable in terms of quality, reliability and 
accuracy. Such problems have a number of 
implications for the organization, particularly at the 
strategic level where decisions are ofan unstructured 
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nature and decisions are at their most critical to the 
organization, These decisions are based on the 
information that is being passed up through the various 
organizational levels and it is therefore imperative 
that such information is monitored for its reliability, 
and managed in ways that the decisions being made 
are reliably informed decisions. 

British Rail is required to adopt a five-year 
forward planning schedule. This is a statutory 
requirement specified by the Department of 
Transport. It is obviously important that such plans 
are based on good information, because many of the 
policies that transpire from this information will 
inevitably affect the organization whether it is 
staffing cuts or decisions which involve millions of 
pounds. 

TLC has a complex infrastructure, which tends to 
produce a large amount of data. Much of this data is 
unused because many potential users to not 
understand the full capability of the systems. This 
can be attributed to a lack of training or lack of 
material supporting the systems when they are first 
implemented. This type of problem can have a knock- 
on effect in that many users have not been informed 
what data or information is kept within certain 
systems. 

These factors can also lead not only to a 
duplication of systems, but also to a duplication of 
information. If a system is either badly designed or 
documented, it is possible that it will not be used. 
Consequently, the same procedures for gathering 
data will be duplicated by someone who requires 
certain information but does not know how to obtain 
it, or does not realize that it already exists. He or she 
then creates a personal system containing the same 
information that is distributed in another system. 
This problem can present the organization with 
unnecessary overhead expenses and loss of 
productivity. 

Another effect of a complex information 
infrastructure is that of ambiguous ownership of 
information. Difference people from different parts 
of the organization may claim rights to the same 
information. Similarly, some people may tend to 
guard their information so that no-one else can share 
it or benefit from it. People fear personal loss if their 
information is shared because they may have gained 
their status within the organization due to their 
knowledge and sources of information. They 
consequently perceive themselves as being 
indispensable and obviously wish to maintain the 
status quo. 

It quite often happens that the higher up the 
organizational hierarchy one goes, the less predictable 
the information needs of staff at these levels are. 
People do not know what information they need until 
a particular circumstance arises. It is also more than 
likely that they will need the information ‘instantly’ 
but cannot obtain it. This leads to dissatisfaction with 
the systems. 
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RailPlan 

The basic aim of Railplan is to try to compile a 
survey of what has happened nationally in TLC. It is 
a planning and forecasting system. Plans are under 
way for running Railplan on Lotus 1-2-3. There аге а 
number of specific forecasts required from Railplan. 
To compile these, certain types of information are 
required. When the Planning and Evaluation Manager 
analysed what types of information would be required 
for Railplan, he concluded it required the following 
information: 

1, Current information on TLC’s performance: 


Trading Turnover: Revenues 


Working Expenses: Cost of train crew 

Cost of train provision 

Cost of fuel 

Operations control 

Train maintenance 

Terminal costs 

Commercial services and 
security 

General expenses 

Depreciation 

Infrastructure costs 

No. of salaried/waged staff 

No. of staff employed/ 
underwritten 


Train miles 

Net ton miles 

Wagon miles 

Loco miles ` 

Tonnes moved 

No. of wagons 

No. of locomotives 

No. ofterminals, i.e. power 
stations, collieries 


Physical facts: 


2. Retrospective data on performance for comparisons 

3. Forecasts 

4. Monitoring of updates of changes taking place in 
TLC 

5. Expert opinions 


Research methodology 

The basic objective of the research was to apply 
InfoMapper™ to TLC. In order to do this, we carried 
out the instructions and followed the guidelines set 
out in the InfoMapper™ manuals, and in Burk and 
Horton?. In a few instances, we altered the 
methodology in order to meet the circumstances and 
needs that arose during the research. In particular: 


Time-scale 

This research was undertaken over a three month 
period. Horton has recommended that the minimum 
length for this sort of study should be six months. 
Due to these time constraints, we only studied a 
proportion of the information needs and parameters 
outlined by TLC. InfoMapper™ is designed primarily 
for undertakings of a larger, more extensive nature, 
as well as for a more lenient time-scale. 
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‘Project Team’ 

Another of Horton’s specifications is for a ‘project 
team’ to be allocated to various stages of the 
methodology. The project team in this case was one 
of us (KB), together with a senior member of TLC 
staff (Owen Woodward). This factor also limited the 
scope and coverage of the study. If a larger ‘project 
team’ had been involved, we would have collected 
data from a much larger sample of TLC staff. 

Following initial meetings with TLC staff, one of 
us (KB) designed parameters for a questionnaire. 
Having defined these, she input them into 
InfoMapper™, which allows the user to ‘Initialize 
Class’. Within this function, it is possible to edit the 
information that is input. It was important to enter 
the correct information at the start because re- 
initializing presents the problem of dealing with 
adding new choices, or deleting old ones and ending 
up with a mismatch between categories and matching 
actual data values entered into the record. 

The method for collecting data about the IREs 
involved one of us (KB), accompanied by Owen 
Woodward, visiting the relevant personnel in the 
various TLC offices. These people were identified 
according to the IREs, i.e. who should know about 
specific IREs, who manages them, and what system 
holds this information. One to one interviews were 
carried out, as we felt posting a lengthy questionnaire 
(six pages), with only a guidance booklet for help, 
would not generate many responses. Eleven 
interviews were carried out during Summer, 1992. 
The length of the interviews varied greatly depending 
on the status, position, interest and the number of 
resources held by the interviewee. 

The interviews were based on the questionnaire, 
but they moved towards open discussions about the 
information resources. The forms were analysed 
immediately after the interview. This involved 
identifying those data elements that were subject to a 
variety of interpretations, abbreviations, terminology 
conventions, and other variations and developing a 
format for input. This improved the speed of the data 
entry into InfoMapper™, as well as helping to ensure 
data integrity. 

InfoMapper™ has on-screen forms for entering 
data. Eleven screens of data (73 data fields in total) to 
describe each information resource were filled in. 
The data for each include its name, classification, 
status, availability, purpose, authorization, storage 
media, products, management control and supporting 
technology. 

KB created the basic IRE records, one at a time, 
and edited them for completeness and correctness. 
KB carried out data entry in Glasgow. Therefore, 
immediate output of results of the questionnaire 
could not be generated at the particular TLC offices, 
or for the Planning and Evaluation Manager, to scan 
and offer an opinion. 

Horton outlines several options for data input 
into the database, i.e. central input, field input, direct 
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electronic entry. This should be done by the 'project 
team’. However, because only one person was 
involved, and that this Department holds Info- 
Mapper™ on a stand-alone PC only, input tended to 
be very time-consuming. Altogether, 42 complete 
records were compiled and input into InfoMapper™. 
This is a substantial enough sample upon which to 
evaluate InfoMapper™, and reactions to its output. 
We have already noted that InfoMapper™ can 
generate nine types of reports. Using these, TLC 
could assess the worth ofthe database and the printed 
directory as planning, control and quick reference 
tools. As suggested by Horton, each of the nine 
reports was printed off to form a directory of the 
IREs that had been collected. 

We found printing a Directory is very time- 
consuming. We also downloaded reports onto an 
ASCII file. This was also a lengthy process. It took, 
for example, approximately 20 minutes for the Main 
Entry Report File to be dcwnloaded to disk using a 
fast (486) PC. 

This slow process appears to be the fault of the 
software. We found that InfoMapper™ was 
compatible with the word-processing package 
WordPerfect 5.0. Being compatible with such 
packages is very useful if a Directory is to be created 
asit will allow instructions and explanatory text to be 
generated within the Directory. 

To test and assess the procedure for creating an 
information resource map, we followed Burk and 
Horton's instructions for creating worksheets and 
charts. To do this, we used the electronic worksheets 
laid out in Lotus 1-2-3. We slightly altered Burk and 
Horton's technique for creating a User Matrix, a 
Supplier/Handler Matrix and a Manager Matrix, to 
meet the Railplan requirements of TLC. Instead of 
three worksheets, we created one indicating the 
following: 

1.information required for Railplan forecasts; 

2.the people interviewed; and 
3.the systems that were examined. 

We then placed co-ordinating ‘Xs’ depending оп 
who held, or in which systems the particular Railplan 
information was held. 


Results 

We evaluated each of the five separate stages of the 
software on its own merits, because each of the 
stages is very different in its purpose and 
methodology. By doing this, we obtained a clearer 
picture of the performance of the methodology, and 
system, as a whole. - 

Initializing the software 

This stage was very straightforward. This was mainly 
because of the step-by-step guidance offered by 
InfoMapper™ on-screen and the supporting 
documentation. The package had already been 
installed on a PC in the Department of Information 
Science, University of Strethclyde. Before collecting 
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actual data on the information resources used by 
Trainload Coal, we decided how to identify, describe 
and classify these resources. InfoMapper™ assisted 
to a small extent. However, the majority of the 
decisions were left to our own discretion and 
judgement. 

There are two areas that had to be initialized: 
organizational class, and, organizational table. 


Organizational class 

The central purpose behind using the appropriate 
organizational class is to customise a set of seven 
user-defined fields on the data collection form. This 
is so that the available and alternative choices 
presented and the specific nomenclature used to 
describe them, are meaningful to the users’ 
organization. This can be done by either using the 
system-defined terms or user-defined, which offers 
no default choices. 

Having this option is necessary because there 
was no default choice that matched the procedures of 
TLC. There was a possible option of Service 
Businesses that encompassed finance, travel, etc., 
but it was not specific enough. We found that many 
of the terms used reflected American methods of 
working. 

The most difficult part of this stage was clarifying 
what each of the fields meant, and defining suitable 
categories to fit these. Although the supporting 
documentation, and online help offered endless 
examples, it was quite difficult to draw comparisons 
between those and TLC operations. 


Organizational Table 

The purpose behind selecting an organization table 
was to indicate which IREs are accountable to whom. 
Typically, organizational unit names are standardized, 
and among Horton’s examples are, the ‘Office of the 
Vice President for Finance’, again reflecting 
American terminology. If a user does not have an 
organizational table of their company’s hierarchy, 
there is the option to use a default table mounted on 
InfoMapper™. We chose to compile an organizational 
table reflecting their hierarchy. It was necessary to 
enter this table unit by unit. A user starts by assigning 
the organizational unit number, i.e. 000 100 001, 
then the higher level sub-field, 1.е. position title, and 
a lower level sub-field, i.e. location. One can add or 
edit organizational units. This is a useful function, 
because organizations are constantly being 
reorganized! 

Although numbers are used for classifying the 
IREs, they are otherwise meaningless. TLC has a 
complex organizational structure. Although an 
extensive choice of numbers existed, we feel that the 
Software did not accommodate for the number of 
levels that existed in TLC. 

Although the default table is there to be used as 
an example, the same problems of definition arose as 
those for the organizational class. 
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We found the ability to re-initialize the system if 
you changed your mind, or had to make changes to 
the data that was originally entered useful. However, 
the one drawback was, that if re-initialisation does 
take place, it has be carefully edited since the actual 
data may be mismatched between the old and new 
categories specified in the user-defined fields. 
Unfortunately, there were unclear instructions in the 
manuals and in the ‘Help’ screens. It takes a substantial 
amount of time to re-initialize the system and alter 
data that had already been entered. The InfoMapper™ 
User Manual does warn you of problems that may 
arise when re-initializing. 


Data collection 

Within the Setup/Management menu, there is an 
option for printing the blank IRE data collection 
form whose fields have been initialized (either system 
or user defined). This is an essential feature of 
InfoMapper™, because it is necessary to have copies 
of the form not only for distribution, but to check if 
any amendments to the form need to be made. The 
blank form consists of six pages, which are available 
on-screen for editing purposes. 

The questionnaire is long and complex to use. 
We decided that a number of fields were irrelevant to 
the requirements of TLC, and these were deleted. 

The software is, in theory, flexible regarding 
customization. However, in some ways it is inflexible, 
because there are those areas that you may wish to 
delete from the form, i.e. those fields deemed 
irrelevant, but cannot because it was ina fixed format. 
For the sake of presentation and ease-of-use, it would 
be useful to be able to alter or delete these fields as 
required. 

As mentioned earlier, we carried out a series of 
interviews to determine the information resources of 
Trainload Coal personnel. Horton advises the users 
closely follow the questionnaire format to gain 
maximum effect. However, we found that when we 
conducted interviews that rigidly followed the 
questionnaire structure, the interview was not as 
successful in gleaning information as when it was a 
more informal interview. When the interview 
followed a more relaxed and discursive direction, 
using the form as a basis upon which to ask questions, 
it was more fulfilling and highlighted many interesting 
points for input into the questionnaire. 

The form uses too much information management 
jargon for it to be understood by someone unfamiliar 
with the subject. 

It would take a full day to get a clear picture of all 
the IREs held by an interviewee. The length of the 
questionnaire contributes to the time taken during an 
interview. We believe that it could be reduced whilst 
still succeeding in its underlying purpose. 

Horton recommends the distribution of the 
questionnaire throughout the organization, along with 
guidance notes on the meanings of the fields, etc. We 
believe this approach would result in a low response 
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from employees. Face to face interviews mean finding 
out first-hand about IREs. It also allowed the 
interviewees to think about their information 
resources. Their knowledge and awareness of the 
value of the information they hold will increase. 


Data entry and edit 

There are eleven InfoMapper™ screens of data that 
reflect the fields on the printed data collection form. 
These screens are in fixed format. For example, one 
screen asks for a list of key products, reports, outputs 
and services provided and delivered by the 
information resource. Five blank fields, with a 
maximum of 30 characters in each field, are provided 
for these data. If more than five key products are 
involved, there is no way to accommodate the 
additional products. When the user inputs data too 
large for an entry field, the excess keyboarded 
characters run over into the next field. Quite often, 
the length of the entry field was too small for certain 
entries that meant that they had to be abbreviated, 
thus, partially losing their meaning. If the user fails 
to realize this has happened, the remaining letters 
will be saved as the title of that field. However, the 
user can edit the entry. On the other hand, in many 
cases, InfoMapper™ provides more entry fields than 
the user needs. 

Transcribing the questionnaire data into 
InfoMapper™ took between 20 — 35 minutes per 
entry. This varied depending on the amount of data 
that had to be entered for any field. InfoMapper™ 
provides features that are useful in speeding up the 
process of data entry, consistency and editing. 
InfoMapper™ offers users the ability to ccpy an 
existing record to create a new one. One can copy a 
single field, many fields or all fields. This saves 
much duplicative data entry. It can also ѕауг on а 
number of inconsistencies, particularly where 
composite records exist, ie. where a resource is 
common to more than one organizational unit, but is 
not standardized. 

We found the data entry screens very easy to use. 
After making choices that will affect fields that 
appear later in the form, the computer asks you to re- 
confirm that entry. For example, where you have to 
enter whether the system is Manual, Automated or a 
Combination of both, this has to be confirmed. The 
entry will determine the nature ofthe storage facilities, 
hardware and software sections. 





onto an ASCII file, and input the file onto a desktop 
publishing or word-processing package for 
manipulation and better presentation. Generating the 
reports was a straightforward process. How useful 
are they? 


Main Entry Report 

This report is the ‘richest’ disclosure ofthe database’s 
contents for an IRE record. It contains 18 key fields 
and has an identifying number for each entry (the 
Main Entry Report number (MER)) which is used to 
cross-reference the IRE entry in each of the other 
indexes. Not all the fields in the basic IRE record are 
disclosed in the MER. The Directory is meant as a 
ready reference tool, not an exhaustive list. However, 
we feel that some fields shculd have been included in 
the MER that are left out, such as: Remarks; All 
contacts (the contact name included in the MER is 
not explicit enough). Others are included that could 
be irrelevant, such as: Hardware and Method of 
retrieval. It would be helpful if the users could select 
the fields that they feel are useful and relevant to the 
organizational informatior. resource inventory. 

The main problem with generating the MER in 
print format is the time it takes to print. When you 
select the Print option, you have to print all entries 
and although it is possible to interrupt printing, you 
cannot select which records you wish to print. This is 
tiresome, particularly for someone who updates a 
record and wishes to add the amended record only 
into the Directory. If, say, the MER number is 100, 
this means that all the preceding records have to be 
printed as well, unless the file has been downloaded 
to disk — which also takes time, as noted before. 


Subjects Index 

In the Project Manager's Guide, Horton cautions that 
infoMapper™ is fundamentally ‘an information 
management tool’, not a “database searching tool’. 
This explains the unsophisticated search mechanism 
within InfoMapper'V. A maximum of 12 subject 
terms can appear in the subject field of the basic 
record. Only 12 terms canrot do justice to describing 
the full data contents of an information resource. We 
regard this as a serious limitation of the index. If an 
in-depth search of the actual data contents of the 
resource is required, an organization should use 
instead one ofthe well-known text retrieval softwares. 
TLC noted that they did not wish to buy Info- 





descriptive attributes, not actual data values, and not 
the broad functional area supported by the resource, 
which is undertaken by the Functions/Programs 
Index. It was often difficult to distinguish the 
difference between the two. 

Within the Subjects Index, IREs are cross- 
referenced among the various key subjects that relate 
to them. Therefore, regardless of how someone thinks 
about an IRE, there is a good chance that it can be 
found indexed by subject. Pop-up menus allow the 
option of selecting all IREs, or a selected subject. In 
the latter case, a user need enter a term, key word or 
letter. For example: 

Key word: Train 
Term: Train provision 
Letter: T 

This feature was very efficient. However, it could 

be improved by allowing truncation, e.g. Train? 


Functions/Progams Index 

This index consists of IRE's that support a function 
sorted by IRE name in alphabetical sequence. The 
index in theory also indicates the mode of each IRE, 
i.e. Manual, Automated or Combination of both. It 
also cross-references the IRE to its corresponding 
entry in the Main Entry section. However, the index 
we created did not indicate these, whereas the 
examples in the User Manual did. The main benefit 
of the index is, of course, that it groups resources 
under broad categories, which will provide a survey 
of where certain resources predominate. However, 
we fee] that it mirrors the Subjects Index too closely 
for it to be really useful. 


Official Names/Acronyms Index 

This index identifies the IRE classes to which an 
information resource belongs, i.e. Single, Multiple, 
Standardized, Non-standardized and the mode of the 
IRE, i.e. Manual, Automated or Combination. It also 
contains the MER number for each IRE listed, for 
cross-referencing purposes. The main purpose ofthe 
index is to help the user quickly to identify the 
resource in question by using the official name or the 
acronym. It is useful because it is common for users 
to give information resources a popular name that is 
more memorable than the official name. That name 
is favoured throughout the organization. TLC is no 
exception to this rule. 

This index is useful for identifying duplicates. 
For example, two or more resources have been 
developed and are performing essentially the same 
function. Sometimes an information resource may 
operate outside the scope of the official standardized 
systems, or it is simply a duplicate because no-one 
knew that something already existed. 


Organizations Index 

This index lists the organizational unit numbers, and 
the staff titles or units. It also shows the MER number 
and the title of the IRE held within a particular 
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location or unit. Two sequences are offered: all units 
in the organizational table and a given, selected, 
table. The main use of this index is that it identifies at 
a glance all the IREs that belong to a given 
organizational unit. It is also useful in identifying 
which organizational units do not have any IREs 
assigned to them. 


Locations Index 
This index supplements the aforementioned index. 
Instead of listing the IREs hierarchically by 
organizational unit, it lists the IREs by geographical 
location. There are four different ‘sorts’ available, 
only two of which we found were relevant to this 
project (the first two), because we were dealing with 
a British company. The four sorts were: Cities; 
Nations; States/Provinces; and Zip/Postal Codes 
When using this index for searching, there is the 
option of displaying all locations, or exact location. 
This serves as a type of ‘map’ indicating who has 
what information resource and where it is held. 
From the index it was possible to find out quickly, 
for example, all the IREs operated at a specific 
location. 


Hardware/Software Index 

The hardware section lists all IREs that use 
automated equipment to support some aspect of 
the IRE’s operations. On the other hand, the 
software section lists all IREs that use some kind 
of software to support some of these operations, 
both operating systems software and applications 
support software. The index does have the potential 
to be a valuable analytic tool to identify hardware 
and software investments and the various levels of 
implementation of certain resources. It would also 
be possible to see which software packages are 
most often used and make some determination as 
to future investment, training, and application 
development using the established base, and 
knowledge of the software. 


Product/Services Index 

This index provides an alphabetical listing of all key 
products, reports, outputs and services that an IRE 
generates for end-users, patrons, clients and other 
beneficiaries. It shows how to access the particular 
service, etc., thus providing a guide to its location. 
We feel that there should have been more guidance 
on the types of heading, as the outputs of products 
vary a great deal and generalizing terms does not 
prove to be helpful. 


Cross-Reference Reports 

The major use for these reports are as ready reference 
tools when a user wants to locate an ВЕ that is already 
in the database by identifying or verifying its number 
and name, its IRE classification, its MER number, or 
whether the IRE contains the purpose and/or authority 
statement. This is a useful management tool. 
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Systems Management 

A feature of InfoMapper™ is the option for File 
Maintenance. Users can create backup files, restore 
files and re-index the database. The backup 
functionality backs up both the program and the data 
files, so when there are new releases of the program, 
the files will be up-to-date. This accommodates for 
any hardware or software failures, data loss or theft, 
etc. The procedure for doing this is well documented. 
InfoMapper™ makes provisions if the database 
becomes corrupted. The procedure for alleviating 
this problem was very straightforward. 

We had corruption problems, for no apparent 
reason. The system displayed the message that the 
database required re-indexing and would not allow 
access to the records until this was done. It alsc stated 
that some data may be lost. Fortunately no data was 
lost, but there was no explanation in the manuals, or 
on screen, as to why this occurred. 


Other features 

The ‘Help’ facility on InfoMapper™ was a very 
strong feature and was always available throughout 
the various stages involved with using the database. 
The advice was clear and appeared on-screen instantly 
when requested. When an error occurred within the 
system, error messages appeared promptly on-screen. 
However, it was necessary to refer to the User Manual 
to decipher what the error message meant. These 
accounts were not very helpful and only told you, in 
a brief statement, how to remedy the problem. We 
think a user who only uses the system for input and is 
not computer literate, would have difficulties with 
the software. 

The procedures for operating the system were 
well documented and supported. The User Manual 
detailed all the functions and capabilities of the 
system and was comprehensive. The Project 
Manager’s Guide was a useful supplement to the 
Manual because it recommended procedures for 
carrying out the baseline inventory, how to undertake 
the task and analyse the findings. We also recommend 
that Burk and Horton’s book be read before 
proceeding with the inventory. 

The user interface is attractive, especially on a 
colour monitor. The pop-up window menus are an 
excellent feature that guide users through the process 
of data input, edit and output in a very effective 
manner. However, users need to be familiar with all 
the InfoMapper™ terms. 


Initial conclusions on InfoMapper™ 

InfoMapper™ has a number of limitations. It is very 
inflexible. Because the fields are in fixed format, it is 
difficult to manipulate the screens to make it tailor- 
made for the organization. InfoMapper™ does not 
offer good value for money. Although it is already 
designed and all the technical requirements of 


database creation have been undertaken, itis ‘simply’ ` 


aversion of dBaselV but without the latter’s searching 
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capabilities and flexibility of output. If we compare 
InfoMapper™ to other DBMSs, e.g. Paradox, 
dBaselV, etc., the cost is within the same range. 
However, the DBMSs offer more functions than 
InfoMapper™, and are therefore better value for 
money. However, with an ordinary database software 
package such as dBaseIV, the user would have to do 
a lot of programming to get the user-friendly screens 
that InfoMapper™ provides. InfoMapper™, in this 
respect, saves much time. 

The most surprising and disappointing fact is the 
lack of a mapping function. You cannot create a two- 
dimensional map of information resources and flows 
using InfoMapper™. We compiled a chart of IRE 
locations within TLC on Lotus 1-2-3, and think such 
a function should be integrated with the system. The 
nearest the system comes to creating a map is the 
printout of the Organizatianal table, Locations Index 
and Organizations Index. Indeed, it is arguable that 
the very name ‘InfoMaprer’ is misleading. It may 
lead the user to believe that they will get a map of 
their information resources out of the software. 

The ‘Directory of Information Resources’ 
produced by InfoMapper™ is very cumbersome to 
use, and the information resource worksheet is a 
valuable ‘quick-reference’ supplement to the Directory. 
They complement each other because to find out 
further information on an information resource in the 
worksheet, the user can refer to the Directory. 

TLC offered some comments regarding the 
project. They found the ideas behind information 
mapping and InfoMapper™ both interesting and 
applicable to solving their information needs. TLC 
felt that to make the exercise more meaningful, 
everybody within the TLC Headquarters would have 
to be interviewed and/or fill in the data collection 
form. The consensus opinion was that the question- 
naire was too long-winded, confusing and often 
contained irrelevant questions. To enable a better 
Tesponse, a new questionnaire would have to be 
designed that would eliminate irrelevant questions 
and could be more easily understood by staff. 

TLC felt the results reinforced their opinion that 
the information being prcduced in TLC is possibly 
not being used to its full potential. Despite the fact 
they would not consider buying the InfoMapper™ 
software, they feel it wou:d be worthwhile adopting 
some of the ideas and compiling a simpler, more 
comprehensive database o7 TLC information sources, 
possibly using Paradox (a popular TLC DBMS 
package). 

Clearly, using InfoMa»per!" needs management 
support, with management-defined objectives, full 
staff co-operation and understanding of IRM, working 
parties or project teams, i.e. full corporate co- 
operation. In many ways, the InfoMap methodology 
is very simplistic and does not take different 
organizational cultures into enough consideration. 
This was particularly evicent when we attempted to 
gain some ball-park figures on how much money 
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was spent on various TLC information resources. 
This is a standard question used in the technique. 
Individuals are not aware of such things. 

We found the software very helpful for carrying 
out the methodology, despite its complexity. Each 
stage of the technique was outlined in great detail 
and installing, initializing and customizing the 
software was very straightforward. Because the 
package already has the parameters required of a 
database this saves much time because it is not 
necessary to create the database or its interface. The 
user interface and user friendliness of InfoMapper™ 
is one of its redeeming features. On a colour monitor 
it is very pleasant to use, despite being cumbersome 
in the data entry stage. This is the main advantage of 
InfoMapper™. | 

Although each report has its own individual 
advantages and disadvantages, the function is helpful 
for making a hard-copy, Directory-type reference 
tool on the organization’s information resources. 
Because of its compatibility with word-processing 
and desktop publishing softwares, this Directory 
could be presented in a very attractive manner. 

However, InfoMapper™ does have many 
disadvantages. The main drawback is that it is very 
inflexible, particularly in its limited searching 
capability, the fact that it can only be tailor-made to 
an organization’s needs to a certain extent and its 
reliance on American terminology. All these factors 
make it less useful outside the USA. 

Such inflexibility, when compared to the possible 
capabilities of ordinary DBMS, makes InfoMapper™ 
poor value for money. Using a DBMS such as 
dBaseIV would allow an organization to create a 
‘Database of databases’ that is specific to its 
operations. It would also allow for a much more 
extensive selective search function. However, the 
user interface would probably be of a lower standard 
than InfoMapper™. 

Input into the database is very time-consuming. 
Because data input is based upon the six-page 
questionnaire generated by the system and mirrored 
on-screen, it is very monotonous to use. Although 
data editing is essential, it is very difficult to undertake 
this wholly efficiently because ofthe size ofthe entry 
format. 

IRM seeks to harness information resources. 
InfoMapper™ and mapping are trying to do this. 
However, from our results, it appears the software 
needs further development. Thus, information 
managers still have some way to go before information 
mapping techniques and similar information resource 
methodologies can be effectively applied within 'real- 
world' environments. | 

Any IRM technique has to be fully functional, 
adaptable, flexible and workable to achieve its 
objectives effectively and efficiently. We believe 
that the current InfoMapper™ probably fails to fulfil 
those criteria. The software should be further tested 
within organizations of varying natures and purpose. 
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Doing this will allow the software to be fine-tuned 
and its ‘benefits and drawbacks fully recognized. 
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Abstract 


The paper describes various database maintenance functions of Dobis/Libis. In addition, two authority control 
components: (1) adding cross-references to provide linkage between headings, and (2) adding authority notes as 
local decisions about the use of series, both name and title, are discussed. 


Introduction 

Since the advent of library automation, much has 
been written in the professional literature on such 
topics as retrospective conversion, online catalogues 
and their use and the quality of records in bibliographic 
databases. The quality issue still continues to surface 
as a serious concern. Surveys are being conducted to 
determine the nature and amount of errors in 
databases! and to find ways and means to correct 
them. 

The process of making necessary corrections, 
eliminating unwanted records and connecting old 
and new and correct and incorrect headings in the 
database through cross-references is called database 
maintenance. There are, therefore, three major 
components of database maintenance: (1) correction 
of errors, (2) deletion of records, and (3) application 
of authority control. 

The emphasis libraries are placing on database 
maintenance is reflected in the creation of new 
sections in the cataloguing department called 
‘Database maintenance section’, ‘Authorities 
section’, etc. and manned by one or more FTE staff 
with the job title ‘Database maintenance librarian’, 
‘Authorities librarian’, ‘Authorities cataloguer’, etc. 

Effective database maintenance is dependent, 
among other things, upon the database maintenance 
features of the software. 

Dobis/Libis is one such software that has excellent 
database maintenance features. We will review those 
features in this paper. 


Dobis/Libis 

Dobis/Libis is a software package developed jointly 
by the University of Dortmund in Germany and the 
Catholic University of Leuven in Belgium and 
marketed until recently by IBM. A new company, 
Extended Library Access Solutions (ELiAS) has taken 
over the rights to develop and market Dobis/Libis 
from January 1993. Dobis/Libis is an integrated 
library automation system that runs on IBM 
mainframe computers. The latest version of the 
software is version 2.2. 


Structure of the Dobis/Libis Database 

The Dobis/Libis database consists of two main files: 
(1) system files and (2) local files. The files are 
actually indexes to bibliographic records. The system 
files are indexes to bibliographic records in the system 
catalogue which is at the top level of a Dobis/Libis 
network and is shared by all libraries in the network. 
The local files are indexes to bibliographic records in 
the local catalogue maintained at the second level for 
each participating library in the network. 

The indexes, called access-point files (APFs) are 
nine — name, title, subject, publisher, classification, 
ISBN/ISSN, national record number, other entries, 
and document number. There are two additional 
access-point files: (1) the abstract word file containing 
words from the abstracts as an index to abstracts and 
their documents, and (2) the bibliographic pool file 
of MARC records resides adjacent to the catalogue 
and can be searched by the same access-points as 
records in the system catalogue. Pool records can be 
used as sources of cataloguing data for either 
retrospective conversion or acquisitions orders.” 


Database Maintenance 

Dobis/Libis ensures database security. Authorization 
levels ranging from 50 to 1,000 allow the users to 
perform simple to complex functions. The beginners 
normally start from 200-level authorization. As they 
gain experience they get higher levels of authoriza- ` 
tion. In the case of database maintenance, Dobis/ 
Libis is even more restrictive. Dobis/Libis has two 
maintenance functions which are performed with 
varying levels of authorization: (1) the Catalogue 
maintenance function to delete access-point file entries 
and local documents and (2) the General maintenance 
function to delete document numbers from such files 
as vendors, funds, fines, etc. Catalogue maintenance 
is a subfunction of the cataloguing module and is 
normally used by cataloguers. But General mainte- 
nance is a separate function designed to be used by 
the systems personnel who have a better under- 
standing ofthe system design and file structure. Each 
of these two functions will be discussed in detail. 
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Catalogue maintenance 

There are three major components of the Catalogue 
maintenance function in DOBIS (Figure 1): (1) 
correction, (2) deletion and (3) transfer. 


Correction 

This .subfunction is mainly used to correct 
misspellings in access-point file entries. It can also 
be used to change outdated headings provided no 
cross-references or authority notes are attached to 
the headings. If cross-references or notes are attached, 
they must be deleted first before the headings are 
corrected. Later the cross-references and notes can 
be re-attached to the corrected headings. As records 
are attached to access-points via pointers, when an 
individual access-point is changed, all the records 
attached to it are also updated. 


Deletion 

Several types of deletion can be performed under 
the Catalogue maintenance function of DOBIS. 
They include deletion of access-point file entries, 
abstract words, local documents and copies. It is 
important to note that APF entries cannot be deleted 
under two conditions: (1) a document is attached, 
and (2) cross-reference and authority notes are 


Cataloguing 

Catalogue maintenance 

English language files 
1 Correct access-point file entry 
2 Delete access-point file entry 
3 Transfer document list 


4 Delete unused entries 
Other functions 
10 Delete local document 
1i Delete copy 


Enter number or code 


Figure 1. Catalogue maintenance subfunctions 


attached. While the Delete access-point file entry 
subfunction is used to delete entries one by one, the 
Delete unused entries sub-unction is used to delete 
entries in the entire file or a portion of the file in one 
shot. To execute the function, the user is required to 
enter the beginning and ending entries from the file. 
This function should not Ее used when the subjects 
file contains thesaurus entries which are to remain in 
the file even though there arz no documents associated 
with the entries? 

The miss-spelled words and words mistakenly 
broken between the end ofone line and the beginning 
of the next line in abstracts become index words in 
the Abstract word file. Correcting these words in the 
abstract does not automatically update the Abstract 
word file. This means that -he incorrect words would 
stay in the file along with the correct words, unless 
they are deleted from the file by using the Delete 
abstract word subfunction 

Deletion of a record in the local catalogue can be 
performed by using the Delete local document 
subfunction. However, the record cannot be deleted 
until all attached copy rezords are first deleted. If 
someone tries to delete a record without deleting the 
attached copy records, tke system gives an error 
message: ‘Entry in use’. 


Arabic language files 


6 Correct access-point file entry 
7 Delete access-point file entry 
8 Transfer document list 

9 Delete unused entries 








Maintenance 

System files 
1 Translate APPTR to key 
2 Delete document list entry 
3 Delete document 


Local files 


7 Translate APPTR to key 
8 Delete document list entry 
9 Delete document 


4 Delete AP file key 
5 Delete permutation 
6 Delete key using APPTR 
Other functions 
13 System user names 
14 System user numbers 
15 Authorization levels 
16 Delete passwords 


10 Delete AP file key 
11 Delete permutation 
12 Delete key using APPTR. 


17 Display file information 
18 Display queue informaticn 
19 Initialize permute queues 





Enter number or code e end 


Figure 2. General maintenance subfunctions 
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To delete a copy record, there is a separate 
subfunction called Delete copy. 


Transfer 

The transfer document list subfunction allows one to 
move documents associated with one entry in an 
access-point file to another entry in the same file. 
This subfunction is especially useful when performing 
authority control. For example, if both old and new 
headings — names, subjects and uniform titles — exist 
in the database, the first step would be to transfer 
documents associated with the old heading to the 
new heading. This would leave the old heading with 
azero document. In the second step a cross-reference 
can be added to link the old heading with the new 
heading. How a cross-reference is added in DOBIS 
will be discussed later. 


General maintenance 

The General Maintenance function of Dobis/Libis is 
a very powerful function only to be used by systems 
librarians. It comprises several subfunctions (Figure 
2) which, among other things, allow users to delete 
documents from various system or local files. Great 
care should be taken in using these functions as 
serious damage could be done to the database if 
mistakes are made. 

The function also includes some subfunctions 
which are not directly related to the maintenance of 
the database and therefore will not be discussed here. 
They are used to add names to the DOBIS user file, 
assign authorization levels to users in various 
functions and display file information, etc. 

As stated earlier, most ofthe General maintenance 
subfunctions are mainly used to perform a single 
operation — to delete records, whether documents, 
access-point file entries or permutations. The 
difference between Catalogue maintenance and 
General maintenance functions is that the Catalogue 
maintenance subfunctions are used to perform access- 
point file maintenance which is a process limited 
only to the cataloguing module. But the General 
maintenance subfunctions are used to maintain the 
entire database comprising files for acquisitions, 
circulation, periodicals and cataloguing modules. 

"Dobis/Libis requires periodic reorganization of 
files to redistribute records form the overflow blocks 
between the original and new blocks. During the 
process, documents with missing pointers between 
bibliographic and documents files, if found, are 
identified and printed in a report to be deleted with 
the General maintenance function. These records, if 
not deleted, accumulate and cause problems to the 
database later. 

One of the General maintenance subfunctions is 
Delete current passwords. The need to delete 
passwords occurs when the user can not recall his or 
her password. In.this case the old password is 
deleted and a new password is assigned to the 
user. 
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Authority control 
Authority control is another important componeat of 
database maintenance that ‘collects, records, and 
maintains the authorized forms of headings to be 
used as access points and those headings which azt as 
see and see also references to lead the user tc the 
authorized heading.'^ As stated elsewhere, the design 
of the Dobis/Libis database is such that all 
bibliographic records depend on pointers to a series 
of indexes which are maintained as separate, 
independent files. Before adding new headings to 
these files, especially the main files — names, titles 
and subjects — their correct forms have tc be 
determined. Because of the fact that only authorized 
headings are added, Dobis/Libis access-point files 
are also called authority files. But this does not mean 
that the headings once entered will never be chanzed. 
As a matter of fact, due to changes in cataloguing 
rules that govern forms of both personal and corpcrate 
names, or changes in subject headings, series and 
uniform titles, Dobis/Libis files need to be reviewed 
regularly to incorporate these changes. The prozess 
of incorporating changes to insure consistency in 
headings is the main function of authority control. 

There are two major components of authority 
control in Dobis/Libis: 
(1)adding cross-references to provide linkage 

between headings, and 
(2) adding authority notes. 

Each of these two components will be discussed 
in the following paragraphs. 


Cross-references 

Creation of authority records in Dobis/Libis is cone 
in a way different from other systems. The authority 
record in DOBIS contains one or more than one уре 
of cross-reference attached to headings in access- 
point files. The headings with attached authority 
records are kept in their respective files only to be 
identified by an ‘X’ preceding the headings. The 
types of cross-references in Dobis/Libis conform to 
the types provided in the Library of Congress authority 
records, except the codes are different. This can be 
better understood from the following table: 


Dobis/Libis Library of Congress 
See Use 

Seen from UF (Use for) 

See also SA (See also) 

Seen also from RT (Related topic) 
Earlier called Earlier heading 

Later called Later heading 
Broader heading BT (Broader topic) 
Narrower heading NT (Narrower topic) 


Also as subdivision See also subdivision urder 
under the topic the heading 


See also the general ВТ (Broader topic) 
heading 
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Adding cross-references to system and local files 
are done by using two separate subfuncticns of 
cataloguing: (1) System cross-references, and (2) 
Local cross-references. Selection of the cross- 
references function displays a screen that lists three 
files, names, titles and subjects. After selecting the 
desired file a search term is entered to select the 
heading from which a cross-reference is to be made 
to another heading. If the heading does not exist it 
can be added right there. After selecting the heading, 
the system displays the cross-references types screen 
(Figure 3) on which the number of the desired type is 
entered. Finally, the heading to which cross-reference 
is to be made is selected from the file. See Figure 4 
for cross-references added to a heading. 

Cross references in Dobis/Libis are always bi- 
directional. Thus, when an operator creates a 'see', 


Cataloguing 
Cross-references 
Cross references 
1 see 
2 see also 


3 earlier called 
4 broader heading | 
5 also as subdivision under the topic 


Enter number 


Figure 3. Cross-reference types 


Catalogue search 
Subjects 
Cross-references 
Set theory 
1 seen from 
Aggregates 
2 broader heading 
. Mathematics 
3 narrower heading 
Algebra, Abstract 


Summary 


Enter number or code 


Figure 4. Cross-references 


Catalogue search 
Titles 


‘see also’, ‘earlier called’, “sroader heading’, or ‘also 
subdivision under the topic’ reference, the system 
supplies the corresponding ‘seen from’, ‘seen also 
from’, ‘later called’, ‘narrower heading’, or ‘see also 
the general heading’ reference. If such references are 
deleted, the corresponding ‘from’ note is automati- 
cally deleted.’ 


Authority file note: 

On searching an access-point file entry, Dobis/Libis 
does not display some information related to the 
entry such as type, subfield codes, permute language, 
number of cross-references, and notes on the first 
screen. Instead it provides a ‘d’ for ‘detail’ code to 
display this information in the next step. One of the 
fields on the detail panel is the authority file notes 
field that contains four notes subfields. This field is 


6 seen from 

7 seen also from 

8 later called 

9 narrower heading 


10 see also the general headirg 





Documents 


0 





Notes 
note Catalogued together, Analysed 


note . . QA3.L28 
note : su 

entry extension 

Make appropriate entry 





Figure 5. Series authority note 
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used to record some decisions about the entry. For 
example, a decision as to how a title or name series is 
‘to be treated can be recorded (Figure 5) here so that 
when the next volumes of that series come, the 
cataloguer can refer to the authority note before 
cataloguing them. In a manual system, these decisions 
are normally maintained on cards filed ina file called 
the series authority file. The authority file notes help 
maintain consistency in entries, especially name and 
title series. 


Conclusion 

The quality of the database is getting a great deal of 
attention as more and more libraries have completed 
retrospective conversion of their card catalogues. 
While a high quality database requires time and 
effort, it also requires a software that has excellent 
database maintenance features. Dobis/Libis is a 
software which has sophisticated features for 
performing database maintenance and authority 
control activities at various levels of authorization. 
These levels of authorization ensure security of the 
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Dobis/Libis database. All database maintenance 
activities in DOBIS are performed in real time. This 
means that all operator entries result in immediate 
updates. 
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Abstract 


The definition of the ‘information industry’ is given. The state of the art of China's information industry is 
described. The problems and the opportunities for its development are analysed. And finally, considerations on its 


development are put out. 


Connotation of Information Industry 
Information industry (П) in this paper refers to such 
comprehensive production activities and infra- 
structures as research, development and application 
of information technology, and information services 
oriented to economic development and public needs. 
II can be divided into two broad categories (see 
Fig.1): information technology and manufacturing 
industry, and information service. The former includes 
microelectronics, computer technology, communi- 
cation technology, multimedia technology, A/V 
technology, microcopying and photocopying 
technology, electronic publishing as well as infor- 
mation equipment and devices associated with those 
technologies. The latter may be further divided into 
traditional information services (information services 
and consulting activities based on printed scientific 
information, books, files, standards, patents and so 
on) and emerging electronic information services, 
which include computerized information process- 
ing, development and application of databases, 
software production, electronic publications, 
communication and networking systems, office 
automation and other information services and 
consulting activities based on the intensive utiliza- 
tion of computers and communication networks. 

In China, the definition and the connotation of II 
have not been agreed upon yet. In this paper, the 
above definition of II is used as the basis for 
discussion. 


China has to accelerate the development of П 

Information is an important resource and form of 
wealth for any country and is the necessary basis for 
scientific, technological, as well as socio-economic, 
development. Therefore, II should be one of the 
mainstay industries in an information society. In 
today's world, the development level of II has been a 
significant measure ofa country's development level 


and comprehensive power. Both industrially 
advanced countries and developing countries with 
fast economic growth have been promoting their II 
since entering the 1990s (see Table 1). 


Table 1. | 
Programmes for II development in selected countries 


Sue [Fine Wwe [em] 

USA 1992 HighPerformanceComputingand | US$ 638m 
Communication Programme 

USA 1993 SuperHighSpeed Information USS 803m 
Highway Programme 

EC Computer Programme 35bn ECU 












on-going | InformationTechnologyResearch | £2bn 
Programme 
1993-1996 | Information Technology Programme s | 
Japan 1992 Neuronetwork and Optical 900m Yen/y 
. Computer Programme 
1993-1996 | Information Technology Programme | US$ 1.9bn 
1980-2000 | Information Programme Ia | 


To maintain leadership in II, the Bush adminis- 
tration listed the high performance computing and 
communication programme as one ofthe three major 
R&D programmes and invested $638 million in it in 
1992. The Clinton administration put forward a so- 
called ‘Super High Speed Information Highway’ 
programme to further the work in the high perform- 
ance computing and communication programme. In 
1993, $802.9 million has been put aside for it in the 
federal budget. The European Community allocates 
3.5bn ECU to its computer programme. Great Britain 
is carrying out its information technology research 
programme with £2bn. France has begun its informa- 
tion technology programme (1993-1996). Japan 
started its 21st century oriented neuronetwork 
computer and optical computer programme 111992. 
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Figure 1. Connotations of II 
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Korea is putting $1.9bn into its information 
technology programme (1993-1996). Singapore is 
implementing its Informationization Plan (1980- 
2000). These examples from the international 
community indicate that II is regarded as one of the 
leading industries necessary for developing a mod- 
ern economy at the end of this century and at the 
beginning ofthe next century. II will play a more and 
more important role in economic growth. Table 2, 
which reflects the opinions of Japanese experts, shows 
the major industries with a great impact on the 
economy in different periods. 
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According to statistics released by the Inter- 
national Association of Information Industry, IL in the 
world as a whole has witnessed a remarkable growth 
since 1990s. At present, the output value of hardware 
in global electronics and information industries is 
close to $900bn. The sales of the global software 
market is $120bn. The information service market 
totals $18bn. The investments in II by the USA and 
Japan in 1992 increased 15% and 40% respectively 
over 1991. The above data convincingly indicate that 
П is a prosperous industry and essential to economic 
development. 
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Table 2. 
Industries with an essential impact on economic 
development 


Rank 

Time 2 3 

ironand general petroleum |chemicals | automobiles 
steel machinery | coal 


70s [automobiles | electronics, | household | chemicals | precision 
communi- |appliances machinery 
cation 

= household 


communi- appliances 
cation 

The Chinese government has realized the implica- 
tions of II for the four modernizations and the boosting 
of the national economy. In 1988, the Chinese 
government introduced the ‘High Tech Research and 
Development Programme’, which regards informa- 
tion technology as one of the preferential fields. In 
the past five years, China’s II has had a faster growth 
rate in comparison with its past. However, due to the 
weak foundation and reduced input, China’s II lags 
behind some other developing countries, let alone 
advanced ones. The major problems impeding the II 
of China can be summarized as the following: 












office 
supplies, 
service 
industry 








1. Small scale, poor foundation and backward 
facilities 

China's computer and communication industries are 
relatively backward. For instance, ба]аху-П emulat- 
ing computer, which reaches lbn operations per 
second, is the newest giant computer developed in 
China, while in the USA super-gigantic computers 
with a capacity as powerful as 1 trillion operations 
per second are under design. 

In 1991, global II (including consulting services) 
produced an output value of $203bn, while China's 
corresponding output value was only a bit more than 
3bn RMBs, which accounts for merely 0.3% of the 
world total and 0.296 of China's GNP. This low 
percentage reveals a big gap between the level of 
China and that of advanced countries. China has not 
invested a great deal in the infrastructure of 
information systems. Just take the figures for 1986 as 
an example. In that year, China only imported $5.57m 
worth of books and journals. This amount is even 
less than that of Singapore ($10.68m) and India 
($9.91m). Another related problem is lack of balance 
between the investment in hardware and that in 
Software. In the seventh Five-year Plan period, 
different government agencies invested about 20bn 
yuans to build up electronic information systems. Of 
20bn only 5-10bn were used for database develop- 
ment. Thus, a lot of imported computers stay idle. 

The major part of China's information processing 
is still manual. 9096 of information sources have not 
been processed electronically and their utilization is 
not efficient. China's communication technology is 
not advanced. Electronic mail, electronic data 
interchange, teletext, electronic bulletin boards, 
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CD-ROM databases and similar technologies or 
services popular in advanced countries are non- 
existent or just at the first stage of development. 


2. Inadequate information resources and low 
utilization of them 

China's utilization rate of information resources on a 
per capita basis is two to three orders of magnitude 
lower than developed nations. The total number of 
databases in all of China is less than 196 of the world 
total. For various reasons (to be detailed later), even 
those inadequate information resources have not been 
fully utilized. 


3. Lack of coordination among Government agencies 
There is no single government agency responsible 
for national information systems. Instead, many min- 
istries or other government agencies established their 
own information systems focusing respectively on 
science and technology, the economy, society, 
finance, management, journalism and so on. Many 
information sources that should be available to the 
public are restricted to internal use. As a result, lack 
of information resources and idleness of information 
resources are found simultaneously. Due to the lack 
of universal planning and organizational coordina- 
tion at the national level, different government 
agencies have built up information systems with 
duplicate services or functions. They are low in 
quality and poor in standardization. 


4. Poor information awareness among the public 

Over the long period when command economy was 
dominant, Chinese people have become estranged to 
the idea of market competition and as a result, have 
become insensitive to information needs. The value 
of information as well as information services has 
failed to get recognition, and information services 
could not charge reasonably for what they delivered. 
It is estimated that since 1949, the economic loss due 
to lack of information and resultant wrong decisions 
is at lease 1.3 trillion yuans for the whole of China. 


5. Immature information market 

As the value of information has not been fully appre- 
ciated, as a reasonably priced system for information 
has not been in place, as there has not been an 
effective market management mechanism, China’s 
information market is not mature to say the least. 
Although there are numerous information and 
consulting agencies, there are not many really inde- 
pendent, specialized information agencies. At present, 
the quality of information staff is poor on average, 
and so is the quality of their services and products. In 
addition, conditions for commercializing informa- 
tion are not perfect, and protection of copyrights is 
not as effective as expected. Therefore the maturing 
of the information market is rather slow. 


6. National information policy needs to be adjusted 
and intensified 

There is a negative tendency in national information 
policy making. That is to get satisfaction from those 
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general, macroscopic policies while neglecting to 
formulate definite and forceful policies and measures 
to deal with concrete problems in practice. For 
instance, how should the state provide reasonable 
support to non-profit, social-benefit-oriented 
information services? What national policy should 
be taken by the State Planning Commission, Ministry 
of Finance, and SSTC towards the building, 
development and utilization of national information 
resources? What is the appropriate relationship 
between government agencies responsible for 
telecommunication on the one hand and information 
services of different circles on the other in building 
communication networks? Should substantial support 
be given to the database industry at the initial stage 
by the state or not? In a word, government decision 
makers have underestimated the difficulties and 
problems in the infrastructure of II in general, and of 
the information services industry in particular. 
Policies designed to deal with concrete problems in 
practice need to be defined and intensified. 

The above analysis is intended to say that 
international trends in II and practical needs for II in 
China make it necessary to accelerate China’s II. To 
accelerate it, it is essential to deal with those problems 
in a gradual but accelerating way. 


Opportunities for China to develop II 

For any country, the development of any industry, 
especially an industry with high added value, is 
restricted by a series of factors, among which the 
most important three are: demand from socio- 
economic development; physical conditions that 
support an industry; policy environment for 
developing an industry (see Fig.2). 

Figure 2. 

Three basic conditions for industrial development 
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In the 90s these three factors for the development 
of China’s II have come into being. Therefore, despite 
many problems and difficulties, China's II also faces 
some better historical chances. 


1, Ten years of reform and opening have led to rapid 
economic growth in China, with an average annual 
growth rate at 596. The growth rate of 1992 was as 
high as 1296. Both traditional industries and high 
tech industries are full of vitality. Rapid economic 
growth, revigoration of different industries, 
transformation of traditional industries and 
readjustment of industrial structures all call for an 
information industry. The market economy, 
competition mechanisms, technological transforma- 
tion and the upgrading of large and medium-sized 
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enterprises, internationalization of the national 
economy, modernization of science and technology, 
urgently needed public service systems, all these 
require advance information technology, information 
equipment and effective information circulation and 
services. As China's economy develops, the social 
form and the production structure characteristic of an 
information society have also taken rudimentary shape 
in China. | 


2. To develop П, a country must not merely have 
need for it proposed by its socio-economic 
development, but also have some physical conditions. 
Through economic construction during the eight Five- 
year Plans, particularly through reform and opening 
in recent years, China now possesses some physical 
conditions, which are pre-requisites for building up 
П. The following data in some respects may well 
describe the fundamental physical conditions that 
China possesses. 


e During the seventh Five-year Plan period, the 
Chinese government invested more than 20bn 
RMB to build up a dozen national information 
service systems, respectively focusing on the 
economy, S and T, statistics, banks, posts and 
telecommunications, electric power, railways, civil 
aviation, customs, weather forecasting, population 
and so on. These systems constitute the basic 
framework for a national comprehensive informa- 
tion service system. 


e As faras the national data communication network 
is concerned, China has established CHINAPAC, 
which has 32 nodes and covers 31 province-level 
administrative areas. 


® The national satellite communication system has 
taken rudimentary shape, including 5 earth stations 
and 35 thousand surface receiving stations. Direct 
communication with more than 50 nations is 
possible through this system. 


ө There are more than 200 specialized software 
companies or agencies. Their output value in 1991 
was 900mn yuans, accounting for 12.3% of the 
output value of China’s computer industry. 


e There are about 800 databases that are relatively 
viable. Althogether they include 50m records or 
entries. According to 1992 statistics, the 800 
databases are oriented towards the following fields: 
Culture and education (208 databases); S&T (125); 
planning and statistics (68); resources (52); energy 
(42); commerce and trade (38); posts and 
telecommunications, journalism (30); health and 
sport (26); metallurgy (22); agriculture and forestry 
(19); labor and wealth (17); space and aviation 
(15) (see Table 3). 


e China's information service industry is growing at 
an annual rate of 25-30%. The turnover produced 
by electronic information services in 1989, 1990 
and 1991 were 1.5bn yuans, 2bn yuans and 2bn 
yuans, respectively. 
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Table 3. 
Distribution of databases in terms of subjects. 





Fields Р Percent | Fields . Percent 





Government { Agriculture 
and forestry 


Labour and Resources 
wealth 


a Transport 
statistics 
Finance | МЕЕ Lightindustry e 


Cultureand 25.84 | Metallurgy 
education 


Machinery 
and 
electronics 


Postsand р Architecture, 
telecom- environment 
munications, 





Commerce 4.72 Other 11 1,37 
and trade 


Table 4. 
Selected data of China’s S&T information service 
industry. 


1. Independent S&T Information Agencies 
No. of agencies 414 
Staff 26 thousand 


2. Non-independent Information Agencies 
No. of agencies 4000 
Staff 54 thousand 


3. Resources Owned and Services provided 
by Above Ageacies 
Books and joumals 15m titles 
Research reports 4.31 m copies 
Patents 2.2m copies 
SDI ` 224 thousand items 


4. No. of Archives 3522 
Archive documents 114 million 


5. No. of Libraries 2535 


6. No. of Publishing Houses 350 
Annual publication 80 thousand titles 
Newspapers 1486 
Registered S&T journals 5,880 titles 
No, of papers published 200 thousand 
in those journals each year 


„In 1991, 11,733 scientific papers were covered by major inter- 
national indexes such as SCI, ISTP and EI. China ranked 15th place 
in the world in terms of scientific publications. 





е The S and T information service industry in China 
has a longer history than other newer information 
services. In spite of the current tight budget, it is 
relatively well founded through dozens of years of 
effort (see Table 4 and Table 5), 

3. Macroscopic policy environments for developing 

П are getting better and better. 

ө As early as 1984, Mr Deng Xiaoping gave 
important instruction: ‘Tap information resources 
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Table 5. 
Selected Data of China’s Electronic Information 
Service 


Distribution of Online Service System 
Cities Covered 
Terminals 


International Online Access 
Hosts connected 
Databases accessible 


Domestically registerred databases 
In which S&T, engineering databases 


Online Access to Chinese Databases 
Remote terminals 
Finished searches 





and serve the four modernizations’. President Jiang 
Zemin emphasized repeatedly that *Each one of 
the four modernizations depend on informa- 
tionization'. Their important instructions give great 
momentum to the development of China's II. 


e In June 1992, the Chinese government made a 
decision to expand tertiary industries, the core part 
of which is the information industry. 


e In 1986, China introduced the High Tech 
Research and Development Programme. Informa- 
tion technology was selected as one of the priority 
fields. To be concrete, China will focus on the 
following subfields at the beginning of the next 
century: intelligent computer systems; opto- 
electronic devices; integration of microelectronic 
and opto-electronic systems; information access 
and processing technology. 


e The State Council has just decided to make a 
survey of the tertiary industries. This action will 
have implications for the tertiary industries in 
general, and the information industry in particular. 


Considerations on developing China's II 
As mentioned above, China's economy has reached 
such a stage that accelerated development of II is 
necessary. We have a lotof problems and difficulties, 
but we also face rare development opportunities. In 
this situation, it becomes a pressing task to define 
and categorize II, to formulate development plans 
and strategies for II, to set goals to be accomplished 
by the year 2000 and to establish a series of related 
regulations and policies. 

It is suggested that in the 1990s, we should pay 
sufficient attention to the following five aspects: 


1. The computer industry includes many aspects 
andit has multiple objectives. For present day China, 
however, limited goals and priority setting should be 
emphasized. To be concrete, minicomputers and 
microcomputers should be given priority, and the 
emerging software industry should be expanded, so 
as to match China's intellectual resources with the 
maximum demand in China's market. We can imagine 


that if minicomputers and microcomputers enter into ` 


popular use in different industries, then China's 
productivity may increase greatly, and the scale of 
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China's computer industry may expect to go to the 
front of the world. Meanwhile, China's information 
service industry will get a solid foundation for 
development. 


2. Under the precondition of accelerating 
communications technology and means through 
unified leadership and planning, we have to start 
building national communication, public data 
transmission and multimedia information networks. 
Without these networks, China's information 
resources cannot be fully tapped and utilized and 
information services cannot develop smoothly. Since 
the 1980s, China has established an online retrieval 
network that can access databases provided by 
international information hosts. Unfortunately, we 
have not seen a national online system for accessing 
domestic databases. The major problem is that 
network building and database building have failed 
to grow hand in hand. If China's II cannot grow 
rapidly within this century, then the development of 
other industries, even the development of the whole 
national economy, will be impeded severely. 


3, The database industry is an integral part of II. It is 
the foundation of information services. Newly 
emerging electronic information services and various 
kinds of consulting services all rely on databases as 
backup. Therefore, it is necessary and pressing to 
develop China's database industry to meet user needs 
both at home and abroad. 


4, Information consulting services are the most im- 
portant value-added production activity. In fact, one 
fundamental objective of information technology 
and information equipment manufacturing is to serve 
the effective dissemination and utilization of infor- 
mation and knowledge. Therefore, China should boost 
the development of information consulting services. 
The most urgent tasks are: First, our work focus 
should change from traditional information services 
to value-added information consulting services. 
Second, we should construct a number of major 
information delivery and consulting service centres 
to serve economic construction and other social 
activities. Third, we should encourage information 
workers to establish a large number of small-scale, 
specialized consulting services in accordance with 
market needs and their subject expertise. Let the 
market select ‘the fittest’ among them. 


5. To accelerate China’s П, at least the following 
five problems, which are recognized in consensus 
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in information circles, should be solved more quickly 
or given sufficient attention as they have become 
severe obstacles to the development of II in China 
since the late 80s. 


e The government must iacrease capital investment 
in П, especially in the information services 
industry. 

e The government should intensify the planning and 
the control over the infrastructure of the informa- 
tion services industry as soon as possible. Unless 
one-off developments in individual government 
agencies are abandoned, it will be impossible to 
establish a national comprehensive information 
system and network, which will be compatible and 
will be shared by all members. 

e Information legislation, standardization and 
normalization needed by the development of II 
should be worked out. 

e International cooperation and exchange need to be 
improved for China's П to target the international 
market, and also China's intellectual property 
environment should be improved. 


e We should make efforts to train the high quality 
staff needed by the development of II. 
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Introduction 
In this paper I shall try to define the ‘world information 
industry’, and to outline its present situation. 

Then I shall look at the implications for potential 
purchasers: 

e What sort of relationship should they seek with the 
industry for their own advantage? 

ө What makes for success in relations with the 
information industry? 

® The criteria which successful organizations use in 
selecting products from the industry to help in 
developing their own use and management of 
information. 

e How they set about selecting, acquiring and using 
the products. 

Finally, I shall suggest some developments in the 
information industry that seem important for what 
China is currently seeking to do in information 
services and information management. 


The world information industry 

Definitions : 

This is the definition of the world information industry 
that I shall be using: 

Those industries in all countries which 
manufacture or create for the market informa- 
tion services or information products, which 
can support individuals and organizations in 
doing the things with information that they 
need to do in order to achieve their work 
objectives. 

This definition largely coincides with the 
description of the information service industry put 
forward by Mr Liu Zhaodong of the Institute for 
Scientific and Technical Information of China at the 
recent Sino-British Symposium on Information 
Management.* 

And when I speak of information, 1 mean: 

Whatever individuals and organizations need 

to maintain the knowledge and know-how 

they require to achieve their goals. 

It follows from that definition that information is: 

Knowledge that has been organized and made 

visible — usually in the form of information 

products, like monographs, periodical arti- 
cles, or databases — so that it can be 


communicated from those who have it to 
those who need it. 


Information services are: 

Those services which are provided to help 
organizations, businesses of all kinds, and 
individuals to find and use the information 
they need to achieve their work objectives. 
The services may be provided by outside 
specialist institutions, or they may form an 
internal part of businesses and organizations. 


And information management is concerned with: 

e How information is acquired, recorded and stored 

e How it is used and communicated 

e How those who handle it apply their skills and co- 
operated with one another 

e How information technology is used in al] these 
activities 

e How effectively information-related activities 
contribute towards achieving the objectives of 
organizations and individuals 

e The costs and benefits of information activities. 


The definitions I have just given place emphasis 
on thecontent ofinformation — that is, what enterprises 
need to know and apply in order to succeed in 
meeting their aims. 

The role of all kinds of information technology is 
that of an infrastructure to support people in using 
knowledge. 

So there must be a balance between investment 
in: 

1 Getting information, adding value to it, and 
delivering it to the market 

and 

2 The technology to support those activities. 


Today's world information industry 

For many centuries there have been industries which 
fulfil some part of the role I have defined for the 
information industry, in particular those concerned 
with printing and publishing. Today, however, as 
Poirier (1990) points out, the core is those industries 
whose products and services are based on digitized 
information. And in those industries there are 
constantly changing fortunes and changing structures. 
Long-established and powerful companies like IBM 
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decline in conditions of economic recession and 

strong competition; business combine in new ways — 

sometimes with their competitors — to offer new 

services, taking advantage of changes in the 

technology, for example in electronic document 

delivery and electronic publishing. Technological 

innovation, too, leads to many information goods 

and services competing against each other, for 

example: 

e Printing, publishing and electronic publishing 

e Postal services, fax, e-mail, electronic data 
interchange 

e CD-ROMs and online databases. 

In the telecommunications industry, particularly, 
the last 10 years have brought new players into the 
world industry, more internationalization, more 
flexible regulations, and greater competition. Wide 
Area Networks (WANs), as Ashford said in his 
contribution to the recent Beijing Symposium on 
China's information strategy, today offer the 
possibility ofa revolutionary expansion of wide area 
communications, based on satellite channels and 
optic fibre networks; and for the late 1990s there is 
the promise of developments that inccrporate 
sufficient bandwidth to serve telephone, telex, circuit- 
switched digital data, TV and radio on a common 
digital system at realistic transmission speeds and 
improved cost-to-performance ratios. Ashford warns, 
however, that wide area communications is *complex, 
fascinating, and closely interwoven with political 
and commercial interests’, and he advocates caution 
in dealing with it . 

The 1990s, as the author of a recently published 
survey predicted (Ganley, 1992) *both in telecom- 
munications and the... computer/component field... 
can be expected to be accompanied by harsh 
international competition.’ We are currently seeing 
examples of competition between the British and 
United States telecommunications industries, and 
their changing orientations towards the international 
market. British Telecom has withdrawn from its 
investment in the USA's largest mobile phone 
operator, which its main international rival, AT&T, 
has bought; and at the same time is investing in 
AT&T's chief US rival in the long-distance carrier 
market. 

The world information industry, then, is a.capital- 
intensive one, in a constant state of turbulent change, 
with intensive competition, and rapid changes of 
players and alliances. Consequently it is very hard to 
keep track of what is happening in it, and of who is 
involved, and that puts a heavy responsibility for 
those who have to decide on what to buy and on what 
terms, and on the professionals who advise the 
decision-makers. 


The implications for purchasers 

So what line should governments and individual 
enterprises which are entering the market as 
purchasers of the goods and services of the 


58 





information industry take? What sort of relationships 
should they seek with a powerful industry that has so 
much to offer them, so that they get from it only that 
which is truly useful to them? What dangers should 
they be aware of in order to avoid damage and loss to 
their own development? 

The most important guarantee of a productive 

relationship between purchaser and vendor is that it 

should be based on mutual knowledge and 
understanding. On the purchaser's side, knowledge 
of: 

e What their own business or organization needs to 
support its use of information in order to achieve 
its own aims 

e Whatis happening in the industry, based on skilled, 
multi-disciplinary monitoring of market trends, 
warning signals, future changes 

€ The key facts about the vendors and products they 
are dealing with (something that is very difficult to 
get hold of, as Day (1993) and her colleagues have 
shown for electronic publishing) 


and on the vendors side: 

e Knowledge of the purchaser's business 

e Understanding of their requirements 

e Respect for their definition of their own 
requirements. 

It is the responsibility of the potential purchaser 
to show their own knowledge, so as to gain the 
necessary understanding and respect from the vendor. 
A leading British pharmaceutical research firm in 
which I made a case study (Orna, 1990) told me that, 
in selecting information technology products to 
support its strategies to achieve key business 
objectives, the company made its contacts at strategic- 
accounts-manager level in the vendor firms, took the 
initiative in drawing up lists of points on which it 
wanted clarification, insisted on having its questions 
answered in detail, and demanded presentations which 
were realistic, and not ‘sales-pitches’. In this way 
they ‘educated’ the vendors so that future dealings 
with them were more effective for both sides. 

Ashford, in his valuable contribution to the 
symposium already mentioned, summarized the 
situation for potential purchasers admirably: 

e Local Area Networks (LANs) — covering small 
area like a single enterprise — are established 
technology which bring a range of benefits; the 
networking of CD-ROM sources which has surged 
ahead in the past year offers shared access to 
valuable and expensive information resources. 

e Wide Area Networks (WANs) — covering large 
geographical regions — for financial, economic 
and transaction data will continue to dominate the 
volume telecommunications market, and access to 
them will be at commercial rates. 

e WANs for academic exchanges, etc will develop 
rapidly, but the eventual stable state is unpredict- 
able. There are, however, advantages from 
immediate participation in a progressive way — 
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‘provided that investment in long-term assets — 
equipment especially — is controlled at a realistic 
minimum until the eventual balance of services 
and costs becomes clear.’ 

e Problems of subject access and natural language 
communications have to be resolved before full 
advantage can be taken of networked information 
services. 


Some pitfalls to avoid 

1. Vendor over-selling : 

As Cawkell (1992) observes: 'the sales of 

microcomputer hardware and software need to be 

sustained by waves of hype' — exaggerated claims 
for products which bear little relation to reality. And 

Ashford (1993) warns, in connection with LANs, 

that *it is not easy to distinguish, either in print or in 

conversations with information scientists, between 

LIS applications showing real benefits, and simple 

intoxication with new technology.’ 

2. Products for which accepted standards are not yet 

established: 

Again, Cawkell has sensible advice 'Standards 
are discussed once a variety of incompatible 
techniques have been introduced, and eventually 
some kind of a standard is agreed, embodying 
*alternatives' for organizations with sufficient clout 
to get them included. This succeeds in bringing a 
degree of order to chaos, and after a further period of 
time the de facto winners emerge and the pseudo- 
Standard gradually becomes a real Standard'. 

3. Incomplete and misleading information from 

vendors: 

As Goodall (1993), among others, points out, it is 
usually difficult for would-be purchasers to get from 
vendors all the information that would allow them to 
judge whether a product: 

e Meets the quality standards they require in their 
work and is neither too low, nor too high in that 
respect 

e Has the capacity to meet future developments as 
well as current needs 

e Will bring the benefits desired, and will do so at a 
cost which is well below the value they will get out 
of it. 


What makes for success in relations with the 
information industry? 

I define successful relations with the information 
industry as those where the outcome helps the 
purchaser: 

© To achieve strategic objectives 

e To support innovation 

e To maintain its competitive position 

The organizations which get a ‘good deal’ in those 
terms from their relations with the information 
industry seem to have certain characteristics in 
common. The observations which follow relate both. 
to institutions which exist to provide information 
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services, like the British Library, and to companies 

which have their own in-house information services. 

They are based on research studies, such as those of 

Broadbent (1991), and Bowden & Ricketts (1992), 

as well as on the case studies which I undertook when 

writing the book on Practical information policies 

(Orna, 1990). 

1. Some kind of policy or strategy for using 
information to support them in achieving their key 
objectives, which aligns use of information with 
business objectives. 

2. A clear definition of their objectives, and a shared 
interpretation of what the objectives mean. 

3. А shared definition of the knowledge which they 
need to achieve the objectives, and the information 
resources they need to support the knowledge 
base. 

4. Investment in human resources for adding value to 
information. 

5. Investment in information technology on the basis 
of their understanding of what they need to do with 
information, and use of the technology to support 
their workers in using their knowledge and skills. 

6. Monitoring of the environment in which they 
operate, in order to keep their knowledge base 
nourished, combining information from outside 
with inside information; and horizontal as well as 
vertical communication of the results. 

7. Keeping their knowledge base up to date by 
interaction with researchers, other businesses in 
the same field, suppliers, investors, and customers. 


The criteria which successful organizations use in 
selecting products of the information industry 

Those organizations which gain benefit from their 
relations with the information industry seem to make 
conscious use of certain criteria in deciding what 
products and services to invest in. They are fully 
aware of what they want to achieve, of characteristics 
of their work which they want to encourage, and of 
negative features that they want to overcome. And 
they use that awareness to set their criteria. They 
seek, for example, for products that will support: 


Key business objectives 

For example, the pharmaceutical research firm I 
mentioned earlier, in order to help its decisions on 
requirements for new IT system, developed a set of 
critical success factors from its business objectives. 
With co-operation of research and development staff, 
a weighting was assigned to the success factors, in 
the light of the current priorities in the corporate 
objectives. It thus become possible to rank the system 
requirements put forward by staff in an objective 
way that the originators understood and accepted. 


Existing valuable features 

The British Library counts as one of its most valued 
resources the skill and dedication of its cataloguing 
staff. As changes in collections-management strategy 
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and technology open up the possibility for saving 
effort in recording data, and for much more flexible 
user-controlled access to records, the Library is 
moving towards using the skills of staff in new ways. 
The aim is to help them concentrate on high-level 
original cataloguing, and to spend more time on 
thinking and less on manually recording — supported 
by technology which allows a quick move from the 
results ofthinking to embodiment ofthe resuits in the 
database. 


Desired changes in ways of working 

Sometimes the acquisition of new products from the 
information industry can be unobtrusively used to 
help to eliminate unproductive ways of working. I 
met a nice example recently from a British 
government department. In our civil service, there is 
an old-fashioned tendency among some high-ups to 
unthinkingly declare too many documents as being 
confidential —to the detriment ofthe flow of essential 
information (I have even heard of one department 
where there is a confidentiality classification that 
allows only the originator of a document to read it). 
So, in the process of designing the new system, the 
opportunity was taken to make it quite a laborious 
business to declare documents highly confidential. It 
is perfectly possible to do so, but the user who wants 
to do it has to learn a good many commands and go 
through a number of steps. This seems to have had 
the desired effect of making people think more, and 
declare as confidential only those documents that 
really need it. 


Interaction, communication and co-operation 
between people in different work groups in the 
organization, and between the organizdtion and its 
suppliers, customers, research institutions, etc. 
These are all features which have been shown to 
promote successful innovation and competition (see 
for example, the studies edited by (Bowden & 
Ricketts, 1992), and while they depend essentially 
on commitment and understanding from the top level 
down, their achievement can certainly be supported 
and made more rewarding by a good choice of IT 
products and services. 


Maintaining a competitive lead 

Here I'd like to quote two important studies. One 
(Koenig, 1992) deals with the relationship between 
information use and productivity in a large sample of 
firms in the world pharmaceutical industry; the other 
with the role of information management strategy in 
maintaining competitiveness in Nippon Steel of Japan 
(Bowonder & Miyake, 1992). 

Koenig's work shows that among key features of 
the most productive firms are: they invest more in 
developing information systems; their staff spend 
more time on keeping abreast and use more computer 
time; their information service staff are both 
technically and subject sophisticated. 
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The Bowonder and Miyake study found these 
features to be most significant in creating and 
maintaining competitiveness: continuous environ- 
ment scanning and technology monitoring; using the 
results for conscious organizational learning; strategic 
information systems at corporate level; and intensive 
skill development in using information systems and 
information technology strategies. 


Initiatives by workers 
The British Library consistently seeks to promote 
initiatives and co-operation across departmental 
boundaries, in order to develop its corporate strengths 
and use them most effectively. It uses such means as 
bringing together groups of managers from different 
areas over a 12-month period to carry through a 
group project; and setting up self-managed groups of 
workers from different departments with a specific 
task for which they themselves are responsible. 
Initiatives of this kind can be supported by such 
established products as e-mail, and will in future 
benefit from decision-support systems and group- 
work systems. 


Monitoring and evaluation the results of using 
information. 

By monitoring I mean checking what is happening at 
regular intervals, and passing on the results for 
evaluation. Evaluation looks at the results from 
monitoring and applies agreed performance criteria, 
with the aim of seeing that the action is bringing the 
desired results. 

For quantitative checking that key activities are 
going according to plan, it is necessary to have the 
capacity to produce reports on essential factors. That 
means that specifications for software should require 
it to be able to produce appropriate reports, presented 
in ways that make it easy to make the required com- 
parisons, and highlight things that are going wrong. 
Today there is no excuse for deluging users with the 
voluminous and unreadable reports that gave Man- 
agement Information Systems a bad reputation. 


The process 

As wellas applying appropriate criteria, organizations 

which succeed in their relations with the information 

industry and its products follow a series of steps that 

help to ensure successful outcomes. They are 

something like this: 

1. Identify key business goals 

2. Identify what the organization needs to do with 
information in order to achieve them 

3. Monitor the relevant parts of the information 
industry 

4. Identify products/services, etc. that can support 
what the organization needs to do with information 

5.Build up appropriate interactions between the 
organization and the vendors in the information 
industry, drawing in the people in the organization 
who have an essential contribution to make: 
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e Information managers — those who are expert in 
the information needs of the organization, in 
organizing information to make it accessible, and 
in communicating with the users of information 

e Information technologists — those who are 
responsible for designing the systems which will 
support the management and use of information 

ө The people who will use the products of the 
information industry to help them in whatever 
they need to do with information in their own 
work. 

6. Plan the acquisition of products and services. 

7. Anticipate, plan and implement the organizational 
changes needed to make sure that the new products 
can be fully and productively used, to the 
satisfaction of those who use them in their work. 

8. Plan and give the necessary training. 

9. Introduce new products and organizational changes 
in a phased way, monitoring to make sure they 
bring the desired effects, and learning from each 
phase. 


Golden rules that the successful organizations follow: 

1. Learn from your own mistakes and the mistakes of 
others 

2. Don't rush for the latest technology — let others 
take the risks 

3. Thinking is the cost-effective investment you can 
make, because it gives you the best chance of 
succeeding in your dealings with the information 
industry and using its products to good effect. 


Developments in the information industry that 

are important for what China is seeking to do in 

information services and information manage- 

ment 

In the final section of this paper, I turn to what China 

is currently seeking to achieve in the way of 

information services and information management. 

The source J am using is Zhao Yangling’s recent 

excellent article New changes in the Chinese 

information service (1993). I shall try to suggest 

which products from the world information industry 

are likely to be valuable in supporting the changes in 

information services and information management 

which are now taking place, and those which are 

planned for the future. And I shall try to say why they 

are useful, in relation to the points I have made. 

The current key features in the new orientation, 

according to Zhao Yangling, are: 

e Partially paid-for service 

e User orientation and initiatives by information 
services, including services oriented to users and 
their subject interests 

ө Extension of information services into the 
economy, society, and the market place 

e Increased concentration on information for 
manufacturing and industry 

e Moves towards new organizational forms, 
integrating information services with scientific 
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research and production, and with technology, 
industry and business 

e Application of information technology in 
international online services, domestic databases 
(including image and sample information services), 
document management, library automation 

e Telecommunications to support information 
services and information management. 


Partially paid for service 

If one is seeking to generate revenue for information 
services by this means, one needs to be able to assess 
the true costs of acquiring and processing information 
and the values it brings to the users, because that is 
the only sound basis for making decisions about 
what to charge for and how much to charge. 
Among the products of the information industry that 
should be able to help in this is recently developed 
software which will, as Woods (1992) says, ‘help the 
audit tracking of specific parcels of information to 
become a reality... Using the tracking abilities of this 
software it is possible to see where information has 
come from, where it has been and where it ends up’. 
Information is tagged in ways that enable analysis to 
answer such questions as: How has the information 
been used? Where has it helped to bring in revenue? 
What information flows have led to the achievement 
of any objective? i 


User orientation and initiatives by information 

services 

The key to achieving this kind of approach is the 

motivation of information ‘professionals towards 

interaction and co-operation with the users of their 

service. | 

The building up of а culture of this kind can be 

helped by those products of the information industry 

which support communication among work groups, 

and help information professionals to take initiatives 

in delivering value-added information products to 

meet users’ developing work needs. For example: 

e Electronic publishing of current-awareness 
materials from in-house databases, using desktop 
publishing software to create products which are 
designed to be accessible to the intended users. 

e E-mail, and the multimedia developments of it 
that we can expect in the next few years 

e Local area networks (LANs) used within organiza- 
tions for these and other active dissemination 
purposes. 


Extension of information services into the economy, 
society, and the market place 

Reaching out to groups of people who have not 
previously had the opportunity of interacting with 
professional information services is a.very valuable 
development. The products of the information 
industry that are likely to be of most help in this 
direction are perhaps those that help to reach non- 
specialist audiences — especially multimedia (see 
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Cawkell, 1992), and interactive CD-ROM, including 
the prospect of ‘Telemedia books’ which (Barker, 
1992) use telecommunications to support interactive 
distributed distance learning activities. 


Information for manufacturing and industry 

This area depends on information workers being able 
to monitor developments in industry, and in the 
economic and social environment that affects it, and 
to build interactions with their colleagues in industry. 
So here the most useful support from the information 
industry is likely to lie partly in existing well 
established applications like e-mail and networking, 
and partly in applications that are still being 
developed, especially those that are designed to 
support communications and group working. 

Two interesting descriptions of how the 
interdisciplinary monitoring of developments in world 
science, technology, industry and trade is managed 
in Japan are given by Bowonder & Miyake (Nippon 
Steel Corporation) and Fransman (on the contribution 
of government — especially the Ministry of 
International Trade and Industry). 

According to Bowonder & Miyake (1992), 
scanning of the environment is seen as the most 
crucial step. The results are input to a comprehensive 
organizational intelligence system of business, 
commercial, financial, and technological intelligence 
— making use of various channels, such as trading 
houses, subsidiaries, foreign offices and banks, which 
helps to spread the overheads for scanning. The 
activities of individual businesses are, as Fransman 
(1992) explains, complemented by the MITI network 
— а matrix of vertical units, corresponding to main 
industrial sectors, and horizontal ones which deal 
with issues common to all sectors. Besides this formal 
structure, there are informal networks linking MITI 
with academic sectors and industry associations. 


New organizational forms: 

Initiatives by scientific and technical information 
institutions to establish integrated scientific research 
and production firms, or united business corporations, 
as described in Zhao Yangling’s article, are a 
development with great potential. 

They will benefit from the kind of monitoring 
systems just mentioned. Other developments from 
the information industry that should be particularly 
useful here are those which support group work and 
interdisciplinary co-operation, including e-mail and 
its multimedia extension, networking (in its current 
forms, and, long-term, its promised multimedia 
development), and communication- support systems. 


IT applications for databases and information 
management 

For development in these ‘traditional’ areas of 
information management two aspects of the 
information industry and its products are important: 
1. Well-established technologies, products and 
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services, especially those which support networks 
in the use of databases. 

2. Current promising developments, including: 

e Software for LAN access to CD-ROM (Ashford, 
1993) 

e CD-ROM ‘electronic books’ — defined by Barker 
(1992) as ‘information systems that are capable of 
providing their users with pages of reactive 
electronic information with which they can 
interact’. The vast capacity of this medium to store 
text, sound, static pictures, animation, programs 
and motion video in digital form, opens up many 
possibilities for archival, informational and 
instructional texts. 

e Hyper-text and multimedia interfaces for databases, 
allowing users to access them through a unified 
and flexible interface (Cavallaro et al, 1993). This 
one, however, still has some way to go before it 
becomes a commercial possibility. 


Image and sample information services: 

The existing audiovisual facilities and the collections 
of trade catalogues and samples in information 
institutions will also benefit from developments in 
multimedia and CD-ROM, which should help in the 
exploitation of these very useful, but rather intractable, 
materials. 


The future 

The major issues for the future are concerned with 
document distribution, online databases, and the 
integration of the online data network with the public 
telephone system through the national packet- 
switching data network. 


Distribution of documents for maximum utilization 

The current work towards co-ordinating the holdings 
and distribution of documents should benefit in due 
course from electronic document delivery via wide 
area networks (WANs), and the use of document 
imaging systems in electronic document delivery. 
And not embarking immediately on the last-named 
technology is no bad thing. The recent study by Day 
and her colleagues (1993) sets out both the long-term 
potential, and the problems for information services 
which the current state of flux in the industry creates. 

So far as the current technology is concerned, 
Williams (1992) has useful things to say about the 
growing link between optical character recognition 
and electronic document delivery, and use of 
document imaging systems in delivering documents; 
while Goodall (1993) sets out the complex factors to 
be taken into account in assessing the performance of 
recognition technology for document imaging 
systems. 

Apart from electronic document delivery, 
economic distribution and use of documents should 
also gain from ‘electronic books’ on CD-ROM. 
Barker (1992) envisages them for the future as ‘global 
publications’ allowing for ‘dynamic sharing of 
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information’ — a vision which depends on libraries 
having polymedia work stations linked by telecom- 
munications facilities. 

It is important, as discussed at the Sino-British 
symposium, for China to keep a range of options in 
the way of media and formats for delivering 
information, so as to meet the needs of different 
markets and different levels of use — from the simplest 
to the most sophisticated. 


Online databases 

The creation of an online network, linking 

international and domestic databases is a great 

enterprise, in which the products of the information 
industry will play a big part. 

e Wide area networks (WANS) 

The technology is , as Ashford (1993) points out, 

less ofa problem than the nature of ‘the information 
itself, its organization, indexing and delivery’. 
How, he asks ‘can producers of these electronic 
stores of information be induced to spend the 
necessary resources on adding stable and consistent 
descriptions and subject categories to their files 
and records, so that the searcher has some hope of 
retrieving more than a chance selection of relevant 
material?’ And free-text indexing is an unlikely 
solution, given the volume of information, and the 
unpredictability of the results ‘when applied outside 
well-formed texts confined to subject domains 
understood by the searcher.’ Other major problems 
are the language of the original texts — ‘little 
attention’ says Ashford ‘is being given to making 
wide area networks effective across language 
hurdles’ — and the expression of subject content in 
other languages. 

e Network-based multimedia information retrieval. 
Authors differ in their assessment of the state 
of progress. Bulick (1990) believes that the 
technological conditions are already on the way 
to achievement (widespread high-bandwidth 
networks; inexpensive user appliances capable of 
handling multimedia; and standards for represen- 
tation, compression, packaging and transport of 
multimedia). Others, like Bailey (1990) point to 
the factors that may slow the growth of the multi- 
media market: especially competing standards and 
vendor systems. 

e Hypertext interfaces to external databases, which 
will help users by providing a unified and flexible 
way of accessing a range of differently structured 
databases. Cavallaro and his colleagues 1993) 
describe an ESPRIT project on these lines — but it 
is still at the development stage. 

One interesting area of discussion at the Sino- 
British symposium was the initiatives that China 
might take in developing its information service 
industry in relation to the world industry. Two 
suggestions were: 

e Making domestic databases of information unique 
to China available internationally 
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e The possibility of China itself becoming a database 
host. 


Telecommunications — integrating the online data 
network and public phone network 

As Ashford (1993) says, today there is the possibility 
of ‘revolutionary expansion of wide area communi- 
cations'. Communications capacity, based on satellite 
channels and optic fibre networks, is increasing. 
Developments due in the next few years should 
incorporate sufficient bandwidth to serve telephone, 
telex, circuit switched digital data, TV and radio on a 
common digital system at realistic transmission 
speeds. 'Fast packet-switching... will yield 
improvements of about 5:1 in cost/performance for 
this mode compared with standard X.25’. The problem 
is that of a ‘potential increase in disparity between 
developed and developing countries’ in the ability 
to allocate large capital resources. He recommends 
a cautious approach to these developments, 
participating in a progressive way, but controlling 
investment in long-term assets at a 'realistic 
minimum’ until the eventual balance of services and 
costs becomes clear. 


In conclusion 

Inthis paperIhave been trying to pass on something 
of the experience of information professionals and 
organizations in other parts of the world in getting 
the best out of the information industry. 

The experience ofothers is useful to us all, but the 
most valuable asset is the knowledge we bring from 
our own experience of our own situation. That is the 
irreplaceable and essential element for interpreting 
the experience of others in our own terms, and finding 
ways of using it that will work for us. 

As Zhao Yangling wrote in the article of which I 
have made such extensive use, what is required of 
those scientific and technical information institutions 
who have to advise the higher leading bodies and 
help them make decisions is ‘the combination of 
strategy and tactics and the combination of foreign 
experiences and internal issues'. 
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New trends in terminology processing and 
implications for practical translation 


Blaise Nkwenti-Azeh 


Centre for Computational Linguistics, UMIST, Manchester 


Paper presented at Machine Translation Today, a conference jointly sponsored by Aslib, the Association for 
Information Management, the Aslib Technical Translation Group, the Institute of Translation and Interpreting, 
and the European Association of Machine Translation on the 18-19 November 1993, at the CBI Conference 
Centre, London. 


Abstract 

This paper examines how the changes currently taking place in terminology processing and documentation are 
related to the multilingual needs of translation, and also how progress in natural language processing in general, 

and terminology processing in particular, can contribute to the development of reliable, up-to-date terminology 
support tools for translators. The paper also describes some recent experiences in the automatic identification of 
terminological units from corpora. The paper concludes by identifying some specific areas in terminology 
software development which can benefit from the expertise of translators and other language professionals. 


Introduction 

Terminology is now firmly established and widely 

recognized as a distinct area of study concerned with 

the vocabulary of special subject languages, vari- 
ously referred to as ‘Languages for Special Purposes’ 

(LSP). Some scholars would even argue that termi- 

nology has attained ‘discipline-status’, This identity 

manifests itself in at least three ways: 

(i) the study of terminology is now backed by an 
established set of clearly defined theoretical as- 
sumptions (especially on the relationship between 
concepts, terms and extra-linguistic objects), 
methodological approaches and practical goals 

(ii) terminology now constitutes a separate compo- 
nentinan increasing number of translator training 
programmes; students in these courses are now 
given greater exposure to the methodological 
and practical aspects of terminology processing 
(i.e. terminography); many translation schools 
now offer regular seminars and short courses on 
terminology to professional translators 

(iti) several attempts have been made in the past and 
others are currently being pursued at national 
and international level to standardize the de- 
scription of terminological items. We may 
mention, in this respect: 

(a) the pioneering efforts ofthe Nordic countries! 

(b) the ISO-led magnetic tape exchange format 
MATER?, which was recently resurrected as 
MicroMATER? taken on board by the Text- 
Encoding Initiative (TEI), albeit in much 
changed form* 

(c) two CEC-funded projects CEuROTRA-7 on the 
feasibility of standardized and reusable lexi- 
cal resources?, and MULTILEX on the definition 
ofa multilingual standardized lexicon for the 
EC languages’. ' 


Terminological research is a time-consuming 
activity and occupies a considerable amount of time 
of specialized translation: estimates of up to 60% of 
total translator time have been cited in the literature. 
Inarecent study reported inLanguage International, 
translators have been found to spend between 20 and 
42 minutes resolving a single terminological problem. 

In the past, translators used dictionaries and other 
printed reference works for term equivalents; these 
were supplemented by personal collections of 
bilingual terminology. In some cases, industrial 
organizations compiled collections of terminology 
of product documentation to be used by in-house 
teams of translators and technical writers. 

With the advent of computers and rapid advances 
in science and technology, the volume of technical 
literature has grown significantly; so too has the 
multilingual need for such information which is in- 
creasingly more complex and now requires greater 
specialized know-how or terminological research 
than even ten years ago. Consequently, a new range 
of computer-based lexical support tools has emerged 
(e.g. text databases, terminological databanks, 
CD-ROM dictionaries) in order to satisfy the LSP 
requirements of different groups of users. 

In what follows I will focus on terminological 
data banks (or term banks, as they are more popularly 
known) which have evolved directly from the printed 
technical dictionary, and as such are most relevant to 
translators. 


Evolution of term banks 

Motivation for term bank creation 

Term banks have been intimately linked wih transla- 
tion since their inception in the mid 1960s and early 
1970s. The earliest of these term banks were devel- 
oped by translation departments in large organizations, 
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(а) to supplement printed dictionaries by providing 
up-to-date multilingual terminology 

(b) to preserve centrally the considerable effort of in- 
house language specialists, and to make this work 
more widely available 

(c) to permit greater terminological unity among trans- 
lations split up among different translators by 
providing agreed, reliable and unified terminology 

(d)to speed up the translation process by giving the 
translator a single efficient reference tool. 

In the past 10 years or so, we have witnessed a 
proliferation of term banks for research and commer- 
cial applications. More recently, term bank 
development tools have also been introduced for use 
in text-processing environments; these terminology- 
support tools are again aimed predominantly at 
translators. 

The first of the two indicative lists below 
enumerates the well-known term banks and also 
some lesser-known ones. The approximate date of 
creation is entered alongside the early term banks. 


List 1: A sample of term bank centres 
e AMSI (USA) 

e BATEM (Quebec) 

e BD-TERM (Switzerland) 

e BELGOTERM (Belgium) 

e BTQ: 1973 (Quebec) 

e BTB (UK) 

e BTUC (Chile) 

e BTUSB (Venezuela) 

e CEZEAUTERM (France) 

e CILF (France) 

e DANTERM (Denmark) 

e EURODICAUTOM : 1971 (Luxembourg) 
e LEXIS : 1966 (Germany) 

е NORMATERM : 1973 (France) 
e NoTe (Norway) 

e RUHRGAS (Germany) 

e SURVIT (UK) 

e TEAM : 1967 (Germany) 

e TERMCAT (Spain) 

e TERMDAT (Switzerland) 

e TERMDOK : 1968 (Sweden) 

e TERMIUM : 1975 (Canada) 

e UZEI (Basque Country) 


The second list is intended to give some idea of 
the range of products (or rather, product-names) 
‘available in the market.* 


List 2: Some terminology software products 
e Aquila 

e Ascom 

e Dicoterm 

e Index 

e INK TextTools 

e Lingua-PC 

e MicroCezeau 

e Phenix 

e Profilex 
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e Superlex 

e Termex 

e Term-PC 

e TermTracer 

Since translation is essentially concerned with 
interlingual equivalence/matching of units of mean- 
ings (as represented in a text), it is not surprising that 
the primary emphasis and sometimes overriding pre- 
occupation in the majority oftranslator-oriented term 
banks appears to be the documentation of foreign 
language equivalents. Also, as the relationship be- 
tween terms and their corresponding concepts is 
generally assumed to be one-to-one, the problem of 
finding (or more precisely, selecting) linguistic 
equivalents in a target language is assumed not to be 
as difficult as for general language concepts. 

The fact however is that, as far as specialized 
translation is concerned, the target-language equiva- 
lent must be supported by e.g., information on 
conceptual equivalence and contextual appropriate- 
ness. But, as far as terminology is concerned, 
multilingual equivalence is a secondary considera- 
tion when compared to, say, definition. The 
importance of definition is illustrated by the fre- 
quency of monolingual dictionary consultation during 
translation. For, where more than one foreign lan- 
guage equivalent exists, definitions are by far the 
most reliable disambiguation guide. · 

I willuse the labelterminology-support tools(TST)to 
refer collectively to term banks and term bank software. 

The quantitative growth in terminology support 
tools has, unfortunately, not been matched by a 
significant change in quality. Qualitative changes 
have been in the form of making more information 
separately available, i.e. increasing access points; 
very little has changed by way of the information 
categories that are available in the database as a 
whole, e.g. the sort of information normally supplied 
by cross-references. 

There are a number of key problem areas which 
developers of TSTs have to address if real progress is 
to be made in terminological knowledge representa- 
tion. Some results from NLP and Lexical Data 
Processing are relevant in this respect. 

In the recent developments which I shall review 
below, the general orientation is towards the estab- 
lishment of a separate identity for terminological 
databases as reference tools for specialized vocabu- 
laries, notwithstanding the specific requirements of 
any one user-group. The emphasis will mainly be on 
the incorporation of fundamental principles associ- 
ated with special reference so that term banks (or the 
terminological lexicon) can provide the informa- 
tion required for the identification, fixation of 
reference, and correct use of the terms both in a 
monolingual and multilingual environment. 


Progress in term bank design 


The evolution of Term Banks can be subdivided into 
three major phases or generations which broadly 
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correspond to different levels of complexity of ter- 

minological description (i.e. incorporation of termi- 

nological principles and methods): 

(i) The first generation started off as conventional 

. data bank (i.e. electronic dictionaries), and in- 
corporated little or по: terminological theory. 
These ‘term-oriented’ databases are the pre- 
dominant type today and include EURODICAUTOM, 
TERMIUM, TEAM, and LEXIS 

(ii) The second generation of term banks incorpo- 
rated some ideas of structure, notably, hierar- 
chies. In spite of advances in computer data 
management, the few implementations of *con- 
cept-oriented’ systems that exist include the 
Danish term bank (multi-disciplinary), the Nor- 
wegian Term Bank (oil terminology), CEZEAUTERM 
(initially soil mechanics), survit (virology), and 
the British term bank prototype (multi-discipli- 
nary). Although this is a significant improve- 
ment over the first-generation term bank, the 
theory underlying the design of this generation 
database is inadequate to represent the diversity 
of terminological relationships for any one do- 
main (e.g. type, of, part. of, cause-effect, process- 
product, raw, material-product, succession, 
means of operation, etc. 

(iii) In the third generation of term banks, currently 
still underdevelopment but already at an advanced 
stage, terminology is viewed as problem-ori- 
ented, specialized knowledge representation, and 
the terminological database is seen as an expert 
system for terminology. A prototypical example 
of this new generation of ‘knowledge-oriented’ 
term banks is the knowledge acquisition tool, 
CODE (Conceptually Oriented Design Environ- 
ment), which is being jointly developed at the 
University of Ottawa, Canada, by the School of 
Translation and Interpreting and the AI Labora- 
tory of the Department of Computer Science’. 
The CODE environment allows for explicit rep- 
resentation and subsequent retrieval of multidi- 
mensional relationships (see Figure 1); it is 
therefore a more realistic approximation of the 
conceptual complexity ofthe knowledge domain. 


Retrieval facilities in term banks 
The range of queries that can be addressed at existing 
terminology-support tools is, in computational terms, 
minimal and very superficial. Within these environ- 
ments, one can get responses only to simple queries 
such as spelling, usage (language variety, context, 
restrictions, etc.), foreign-language equivalent, defi- 
nition, context of use, restrictions on use, 
bibliographic source, (other) subject(s) in which used, 
and synonyms/abbreviations, all of which require 
extraction of explicitly-coded information from 
within individual records, and access via the main 
term or other index term. 

Because most TDBs still rely on conventional (or 
enhanced) relational database management systems 
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for storage and retrieval, the two-dimensional tabu- 
lar representation of the model imposes restrictions 
on the information categories over the whole data- 
base. The uniform structure required by these pack- 
ages means, for instance, that one cannot elegantly 
(i.e. without duplication) represent multifaceted or 
domain-specific relationships within the same 
multidisciplinary database. Assume the following 
епіту! in one such database: 


Lexical Entry: arthritis 
Def: Any abnormality ofa joint in which 
objective findings of heat, redness, 
swelling, tenderness, loss of motion, 
or deformity are present. 
isa: inflammation 
g_affects: joint 
symptom: heat” 
redness/ 
swelling/ 
tenderness/ 
deformity/ 
loss_of_motion 
g_affects_spec: rheumatoid arthritis/ 
cricoarytenoid arthritis ... 
bacterial arthritis/ 
fungal arthritis/... 
symptom_spec: hemorrhagic arthritis/ 
deforming arthritis/... 

It is difficult to represent the above relationships 
specific to the terminology of medicine alongside, 
say, those specific to automotive engineering and 
others specific, say, to information processing: 


cause, spec: 


Relationships specific to medical concepts, e.g. 
e for diseases: 
x isa, 
x g affects, 
x caused by, 
x has symptom, 
x transmitted, by, etc., 
Relationships specific to automotive engineering 
concepts, e.g. 
e for vehicles: 
x function, 
ж powered by, 
x transporting, 
x medium, 
x has, part, 
* typical agent, 
x typical. size, etc. 
Relationships specific to information technology 
concepts, e.g. 
® For storage media: 
* recording_technology, 
* degree_of_writability, 
x physical form, 
ж content, etc. — 


It should also be said that even the much publi- 
cized commercially available terminological pack- 
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ages (software and/or terminologies) only offer stop- 
gap solutions to the terminological needs of transla- 
tors, and would need to be carefully hand-crafted to 
handle complex terminological information realisti- 
cally. I will therefore exclude these when consider- 
ing long-term solutions to the LSP needs of translators. 

Furthermore, processing textual information, for 
example, making ‘string searches’ in a particular 
field, is not straightforward because this function is 
generally not part of the software design and requires 
a separate program written to perform the task. 

In a translation environment, users often require 
information of an inferential/evaluative nature as 
opposed to factual information, and which cannot be 
obtained in the majority of current systems, e.g. 

(i) specific facets of interrelation: Which terms are 
related to Y (by part, type, cause, process, etc.) 
(ii) nearest FL equivalent: What is the nearest foreign 
language equivalent for X? 
(iii) contextual synonymy: Can term X be used in 
the context of Y? 
(iv) conceptual environment: List the immediate 
conceptual information for X. 
(v) functional aspects: What do you call a machine 
that does Y? Or, Has X got any parts? List them. 
(vi) relational description: List all terms which have 
parts associated with them. 
(vii) nature of interrelation: What is the relation 
between terms X andY? 
It is however doubtful whether general-purpose 
terminological reference tools will ever meet the LSP 
requirements of translators. Translators tend to spe- 
cialize in a limited number of text types, e.g. legal 
texts, chemical texts, social legislation, medical texts, 
etc. Ironically, the areas where demand for translation 
is greatest, and therefore the expertise of language 
specialists is much sought after, are those where either 
(a) the vocabulary is not yet consolidated, espe- 
cially in the emerging disciplines, or 

(b) the concepts are new to the language. ` 

In the absence of up-to-date multilingual terminol- 

ogy records, translators will undoubtedly continue to 

be involved in 

(i) term-creation, and more so in 

(ii) systematic compilation of terminology from grey 
literature', more of which is rapidly being as- 
sembled/made available in MR-form. 

The terminology component of TR training should 
provide the necessary background for accomplish- 
ing task (i), via term-formation patterns. It should 
also provide the skills for identification and extrac- 
tion of terminological units from texts in task (ii). 


New directions in terminology compilation 
Representational aspects 

A terminologically-oriented knowledge management 
system should facilitate the storage and retrieval of 
coherent collections of terms. A significant develop- 
ment with regard to representation is the terminol- 
ogy-specific software developed under the CODE 
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system? which, as already mentioned, makes it possi- 
ble to represent multi-hierarchical and multi-rela- 
tional structures with minimal duplication of 
information (Figure 1). 


Retrieval aspects 

The most significant development in human-assisted 
or machine-assisted terminography is the research 
into the use of an integrated package of termino- 
graphic and editing tools in the so-called 'translator 
workstation or ‘translator’s workbench’ (TWB). 

A typical TWB should, among other things, 
provide translators with an integrated package of 
computerized terminology tools in a MAHT envi- 
ronment, with facilities for multilingual text process- 
ing, (remote) access to non-resident term banks and 
other terminological support tools (including other 
machine-assisted translation systems), and dynamic 
terminology management (i.e. machine-assisted 
creation/acquisition, extension and maintenance of 
collections of terminology). 

Unfortunately, the term-acquisition modules 
currently being developed within the integrated trans- 
lator workbench environments embody little termino- 
logical knowledge. They are inherently unsatisfactory 
because they have not resulted from a study of the 
term-formation and other sublanguage characteristics 
of the domains in which they are intended to be used. 


Term identification 

Ideally, the extraction of terms from a machine- 
readable corpus should be performed automatically, 
if we are to benefit from the speed and consistency 
(NB: not Accuracy) which computational tools pro- 


. vide. Researchers are currently investigating various 


‘semi-automatic’ and ‘automatic’ ways of identifying 
potential terminological units. Some collocational- 
type methods have already been incorporated in 
TWBs."! 

We at CCL have recently been examining the use 
of positional information of lexical items in term 
identification, using corpus texts and terms, e.g. 
from the field of satellite communications.” This 
terminology-oriented method exploits the regulari- 
ties in term formation which are characteristic of 
each special subject. 

The work so far has focused on identifying 
positional values from existing term lists and using 
this information to extract new units from a corpus. 
For example, ifthe input dictionary contains the terms: 

e frequency assignment 

e carrier frequency 

e constant frequency assignment 

e available bandwidth 
the term-identification program should, and does in 
fact, recover the terms 

=> carrier frequency assignment 

=> available frequency bandwidth. 

Using a list of approximately 600 terms manually 
extracted from a 50-page telecommunications text 
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Figure 1: Multidimensional relationships in a terminological knowledge base 


corpus, we have, for example, been able to automati- e maritime mobile satellite service 

cally extract over 400 new potential terminological e maritime radio navigation satellite service 

units from the same corpus. The list below shows e maritime satellite 

examples of extracted terminological units having e narrow beam satellite antenna 

satellite as element: e near antipodal reverse frequency assignment 
satellites 


List 3: Automatically extracted terms 
e ionosphere sounding satellite 

e justified satellite link 

e land mobile satellite service 

e long intersatellite link 

e low altitude observation satellite 

e low orbiting satellite — ^ 

e major path satellite 


radio navigation satellite service 
artificial satellite 


complete satellite communications networks 
recurrent earth track satellite 

reflecting satellite 

satellite antenna gain 

satellite antenna polarization 
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satellite antenna radiation pattern 

satellite antenna reference pattern 

satellite antenna reference radiation pattern 
satellite redundancy 

satellite repeater 


One of the significant aspects of the methodology 
(apart from the fact that identification is so far fully 
automatic) is the fact that it can ensure comprehensive 
coverage of all the term combinations in a given 
paradigm. Term identification is not as straightfor- 
ward as it may seem (anyone who has done thematic 
terminology research will attest to this). The following 
examples highlight some of the problems with the 
positional approach, namely, inclusion/extraction of 
non-term compounds: 


List 4: Errors in term identification 
® civil time 
e magnetic disturbance 
—» civil disturbance 
e complete reflective surface 
e digital information 
` => complete information 
e correct check 
e picture information 
=> correct information 
e key pulsing signal 
e picture element 
=> key element 
e acrodynamic force 
e natural noise 
=> natural forces 
e outgoing country 
e single sideband 
=> single country 


Eventually we hope, of course, to be able to do 
away with an input term corpus altogether, and to 
minimise the incidence of non-terminological units, 
by incorporating statistical, lexical-semantic and other 
parameters in the identification program. 

Part of the problem lies in the fact that a good 
knowledge of the domain is often necessary espe- 
cially if general language words have specialized 
usage within the domain — as simple terms or in 
combination with other lexical items (general lan- 
guage words and special language term elements) to 
form compound terms. Any automatically-generated 
term list would therefore necessarily have to be post- 
edited by a human specialist. | 


NLP-oriented terminography 
From the earlier summary, it emerges that the main 
changes in terminographic orientation over the past 
few years have been from word-based to concept- 
based systems, and from technology-influenced, 
database-dictated, inflexible structures to conceptu- 
ally-motivated, dynamically-generated systems. 
There are also significant methodological changes 
currently taking place in the field of NLP. 
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Firstly, MT system developers are now moving 
away from the pure rule-based approach — which has 
been characteristic of the domain over the last decades 
— towards empirical (corpus-based or example-based) 
approaches which make direct use of information 
extracted from large corpus resources (typically paral- 
lel/translated texts), or hybrid approaches which consist 
of a rule-based соге and add-on empirical modules! 

Secondly, there is broad agreement on the need 
for separate GL and SL lexicon modules (or at least 
for different types of information for lexical and 
terminological entries) and for the need to incorpo- 
rate sublanguage-specific information as an integral 
part of the grammar and lexicons of these systems.'4 

The recent multinational efforts towards defini- 
tion of standards for NLP lexicon description, in 
particular, Eurotra-7 (1990-91) and MULTILEX 
(1991-92) merit special consideration here because I 
consider them to be of particular significance to 
translators, if we take the view that the integrated 
translation environment will be the setting for the 
future. 

The Eurotra-7 Study? identified two main catego- 
ries of standards depending on their object: the 
contents of linguistic description and its representa- 
tion. The study concluded in its Final Report", inter 
alia, that: 

Within descriptive linguistics, different 
theories and descriptive models are basi- 
cally interested in the same phenomena, but 
they classify the phenomena in different 
ways... such classifications of individual 
objects of an observational domain allow 
for different, even a priori incompatible 
generalizations (p.72). 

The authors of the report recommended that re- 
search in general language and sublanguage be carried 
in parallel as it would then allow to answer the 
following questions: | 

— to what extent can we share descriptive 
devices between general language and 
sublanguage?- how can the peculiarities of 
sublanguage which are usefully described in 
terms of restrictions, deviations and prefer- 
ences with respect to knowledge about 
general language items, be best accounted 
for in a formal linguistic specification? 
(p.112). 

With respect to representational standards, the 
Multilex project (a follow-on from the Eurotra-7 
study) description’? is based on the assumption that 

‘the same format/formalism can be used in 
SL and GL. It seems useful in order to 
accommodate descriptions from a whole 
range of sublanguages and from general 
language, to have one common representa- 
tion or to have means of combining several 
representations. (p.19) 

The above re-orientation of NLP and re-defini- 
tion of its components opens the way for translators 
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and other language professionals — who have so far 
been marginalized in the development of MT 
grammars — to play a greater role in helping compu- 
tational linguists and computer scientists identify 
areas of potential translational problems and formu- 
late rules for resolving these issues. The pragmatic 
experience of translators can be deployed so that 
statistically-based preference mechanisms are more 
consistent with those of particular micro-environ- 
ments. We can equally rely on these professionals to 
provide realistic descriptions of language and equiva- 
lence which they encounter in routine work in actual 
texts, rather than relying solely on the intuition of 
computer scientists or other ‘non-language-profes- 
sional’ grammar writers. 


Conclusions 
(1) Translators must therefore learn to separate terms 
from words, identify compounds or other juxta- 
positions which may be single units or casual 
collocations, recognize variants and have crite- 
ria for finding the standard form, etc. 
Being able to learn and to recognize that they are 
dealing with a term rather than a word, will 
narrow down the search space in the reference 
works to be consulted. 
(3) Translators/interpreters also need to know where 
there are conceptual or terminological incompat- 
ibilities between their working languages so that 
they know when a paraphrase or a neologism is 
necessary. Such incompatibility can only be iden- 
tified either by comparing or through a knowledge 
of the conceptual structures of the subject field 
in the different languages. 
Although term banks and other multilingual 
terminological reference tools are aimed prima- 
rily at translators, the contribution of the latter in 
the logical structure and content of these prod- 
ucts has so far been marginal (apart, of course, 
from the use of translators’ terminology cards to 
build up the collections). 

(5) As the terminology requirements of NLP and 
MAHT/HAMT converge, translators will be 
called upon to play a greater role in lexicon 
design by providing NLP programs with various 

‚ types of structural information (decoding, pars- 
ing, disambiguation, interlingual mapping, 
creation of new lexical items, etc.). 

Also, as up-to-date terminological reference tools 

become increasingly available mainly in MR 

form and as part of an integrated translation 
system, translators will not only need to be able 
to evaluate the utility of terminological products 
forparticular operations. More importantlv. trans- 
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(7) Computerized terminography and the availa- 
bility of databases of parallel texts offer 
opportunities for extensive coding of text-type- 
specific contextual information on terms, their 
textual variants and foreign language equiva- 
lents. Contextual information will in future 
constitute a more central component of the de- 
scription of terminological items. In fact, the 
function of manually-entered definitions may 
have to be re-assessed if elaborate systems of 
terminological relationships are represented in 
the terminological database, and the facility 
exists for automatically-generating terminologi- 
cal definitions from these and other information 
fields. 

Data-preparation requires enormous human 
resources and is therefore uneconomical for 
small-scale organizations. But, with the avail- 
ability of term-identification tools, systematic 
collections can quickly be assembled from MR 
data. Significant economies can be made by 
being able to look up all/most potential 
terminological units before embarking on the 
text-conversion task itself. In these circum- 
stances, the role of the language professional 
would typically involve development of firm- 
specific terminology collections, and evaluation 
and recommendation of commercial packages. 
In order to carry out any meaningful evaluation, 
they have to have knowledge of such bench- 
marks as user-friendliness, relevance of 
information, completeness, flexibility, etc. 
Finally, it is well-known that translators have a 
distrust of theory or theorising. In order for any 
of the above goals to be attained, we need first of 
all to convince translators that the solution of 
practical translation problems is assisted by an 
understanding of the underlying principles of 
terminology and that a sound methodology for 
developing terminology must also be based on 
the same theoretical foundation. 
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Abstract 

The paper discusses information services which are being developed within the National Museums of Scotland 
(NMS), to support the Museum of Scotland Project. NMS has started to build a new museum in the centre of 
Edinburgh, to display its rich collections of Scottish material. It is due to open in 1998. The information needs of 
project teams are examined. A strategy for responding to the rapidly changing needs of project teams is described. 
The approach concentrates on developing a variety of information resources, which together meet the needs of 
users, and making some of them accessible throughout the organization as services on a data communications 
network. Internal services include collections databases, a collections management system for the Museum of 
Scotland Project, and the library catalogue online. External services include the development of a Scottish 
national database of museum objects, and access to the Joint Academic Network, The data communications 


network is described, and the management of the system is discussed. 


1. Background | 

The National Museums of Scotland (NMS) is a group 
of three large museums in the centre of Edinburgh, 
with several outlying sites. NMS was formed by 
amalgamation in 1985. The four million items in its 
collections cover a wide subject area: archaeology, 
military history, geology, decorative art, zoology, 
and the history of science and technology. The col- 
lections include a wealth of Scottish material, which 
represents a unique national resource for the study of 
Scottish history and natural history. 

The desirability of gathering the Scottish mate- 
rial together into a single display area of sufficient 
size has long been recognized. In 1993 NMS was 
given funding to start building a new museum on a 
site in central Edinburgh: the Museum of Scotland 
(MOS). Construction started in the summer of 1993, 
and the building is due to open in 1998. The Museum 
of Scotland Project presents an opportunity to de- 
velop a museum environment which will enhance 
public enjoyment of the objects themselves, stimu- 
late interest in Scottish culture, and promote an 
understanding of the ideas and conditions which 
created that culture. 


2. Project information 

When project planning began in 1991, it was clear 
that information support would be crucial. Informa- 
tion is needed by many different users: 

e museum managers 

e specialist project managers 

e architects and designers 

e curators 

€ conservators d 


e public relations officers and fund raisers 

e education staff, and those planning the MOS pub- 
lications programme. 

They are likely to need 

e museological and architectural information, to in- 
form the design of the building 

e information for research, education and publica- 
tions about the objects and their historical and 
environmental contexts 

e collections management information. 

Some ofthe information already exists, and some 
is being created or reorganized in an accessible form. 
Some information resources can be found within the 
organization, but much information will be retrieved 
from outside — from other museums and academic 
institutions, and from commercial sources. Informa- 
tion about the project is needed for publicity and 
fund raising, since NMS has to raise £8m itself to 
commission the galleries of the museum. 

A project of this kind is dynamic. As it progresses 
the information needs of those involved will change, 
and sometimes change rapidly. The explicit recogni- 
tion of the dynamic nature of the project, and the 
importance of responding to changing demands for 
information, has shaped the information services 
which we have created. 


3. Information strategy 

An information strategy for meeting the needs of 

project teams, and for responding to those needs as 

they change has been developed. There are four parts 

to the strategy: 

e information resources are being developed in-house 
to meet the needs of project teams 
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® where technically feasible these resources have 
been put onto an internal data communications 
network, making them available to everyone in the 
organization through terminals in offices, labora- 
tories and stores 

ө work has started on the identification of relevant 
external resources, and to develop appropriate 
access to them 

e management time is devoted, month by month, to 
monitoring the information services and adapting 
them to new needs. 

Inthe following section we briefly describe some 
of the internal and external information resources. 


4. Internal resources 

4.1 MOSIS and NMS Collections Databases 

The NMS collections are large, diverse, old, and 
distributed across dozens of store rooms: their docu- 
mentation in 1985 was similarly dispersed and 
inconsistent. The central database was established to 
make it possible to audit the collections, and to 
improve the effectiveness with which they are used 
by allowing all staff to use a single catalogue.! NMS 
began to build its collections databases in 1987, and 
over 380,000 records had been input by the summer 
of 1993. The database software used initially was 
Minisis. Most ofthe data has now been transferred to 
Quixis, a Minisis application. 

The Museum of Scotland Project makes new 
demands on the information system. More staff need 
access to the information at the same time as the 
information is changing rapidly. Objects are chosen 
for display, and this has to be recorded, their dimen- 
sions checked, an assessment made of whether they 
need to be conserved, and so on. Objects are moved, 
and new ones are acquired. 

The range of information being handled, and the 
new focus for its use, have lead to its provision being 
presented in a new way, as the Museum of Scotland 
Information Service (MOSIS). The need for a rapid 
response to emerging demands has brought us to 
define a new role for a member of staff in giving the 
front-line service by being available to give help and 
information, and by understanding the incomplete 
development of the information so that firm deci- 
sions are not taken on the basis of facts which have 
not been checked. Sometimes the questions are un- 
expected: ‘Which objects proposed for display on 
the top floor are too big for the lift?’ 

Subject retrieval is of the first importance in 
working with a collections database. In NMS our 
approach is completely pragmatic. It is based on the 
use of high-level classification for which we have 
staff time to apply to the records.” 


4.2 NMS Library online catalogue 

The NMS Library was formed out of the libraries of 
the three largest museums in the group, the oldest of 
which was founded in 1780. It still operates from 
three sites and contains about 250,000 volumes of 
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monographs and serials. Subject coverage is wide, 
reflecting that of the object collections, and again 
Scottish material is strongly represented. 

The benefits of automation for the libraries in all 
three museums were recognized from the start of the 
project. There was a need to make the collections at 
each site accessible to staff throughout NMS, to 
maximise the support which the library could pro- 
vide to research undertaken by project teams. A user 
survey conducted in 1991 confirmed the importance 
of such access.) Automation of the whole catalogue 
has been undertaken over the past five years, and the 
complete online catalogue, containing 106,000 
records, opened to users in April 1993. The software 
used is Dynix. 

The online catalogue is searchable by words in 
the title, exact title, name keywords, exact names 
and by subject. The name keyword option which 
allows Boolean combination is useful where a user 
is not sure of the exact form of a name, particularly 
of a corporate body such as a museum or associa- 
tion (there are over 13,000 records with ‘museum’ 
in the name or title field). The catalogue records, 
which have been produced over several generations 
of cataloguing practice, are variable in quality. We 
have several large editing projects planned to run 
over several years, starting with name authority 
headings. 

The full catalogue has been available to users for 
less than six months, so it is too early to undertake a 
formal evaluation. Users find it particularly useful 
for subject searches, since it is now possible to 
retrieve relevant material, about the Jacobites for 
instance, which is held at more than one site. Several 
users have said that the 'added value' of the auto- 
mated catalogue to them is the fact that they can use 
it from their own terminal. This is particularly impor- 
tant to the Museum of Scotland Project teams, who 
are often too heavily committed to be able to visit the 
Library regularly. 


5. External resources 

NMS acts as a focus for the building of national 
databases of museum holdings.‘ The main sources of 
the records are the databases which are being created 
by individual museums, which can be brought to- 
gether on one machine in Edinburgh: so far 80,000 
have been collected from 30 museums. This collec- 
tion will grow rapidly over the next few years. We 
anticipate that by the time the Museum of Scotland 
opens, museums in Scotland, including NMS, will 
have input almost one million records. 

The project teams have access to several other 
important information resources. The online cata- 
logues of the National Library of Scotland (NLS) 
include the Bibliography of Scotland, which holds 
records for all the Scottish material accessioned from 
1988. The National Monuments Record for Scotland 
is building a database of information about all its 
sites. The Scottish Record Office is gradually auto- 


Aslib Proceedings, vol.46, no.3 





mating the records for both its own collections, and 
external collections reported in the National Register 
of Archives. 


6. Data communications 

The aim of the NMS’s internal data communications 
network is to give the staff access to the information 
they need from central databases at their desks. The 
staff are dispersed across five sites, the library across 
three and the objects across thirteen. Often the indi- 
vidual needs to know about books or objects which 
are on another site. It is not our intention to take up 
the concept ofthe integrated library/collections docu- 
mentation system. Instead, we believe that the most 
cost-effective information service can be created by 
giving access through the PC on the desk to all of the 
Museum's central but separate information systems. 

The network is not technically sophisticated: it is 
a large star focusing on the minicomputers which are 
located in the same room. Within the main building, 
where the minicomputers are located, there are 65 
lines. Four more sites are served by another 36 lines 
and dial-up access over the telephone network is also 
possible. The main building is itself so large that line 
drivers are needed to boost signals on two-thirds of 
the lines in it, and more than half of the time of one 
member of staff is taken up with the maintenance of 
the network. The Museum's commitment to making 
information easily accessible to staff is demonstrated 
by the fact that a third of the total cost of hardware, 
software, and central hardware has been devoted to 
data communications. 

Our access to external networks is at present 
limited, though this situation would be radically 
changed by joining the Joint Academic Network 
(JANET). The central databases, particularly the cata- 
logues of university libraries, can for the time being 
be accessed over the public telephone network. Tel- 
ephone lines can also be used for dialling into JANET 
with the sole aim of using library catalogues, and for 
searching commercial databases. These links are 
limited, however, since they can only be established 
from a terminal or PC beside a modem: it would be 
better to have access from anywhere on the internal 
network. 


7. Management 

We have described how our organization is support- 
ing the information needs of a major project, now in 
its early stages, and with five years still to run. It 
would be difficult to evaluate the overall effective- 
ness of information support in this environment; the 
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resource or service evaluated will be out of date by 
the time the evaluation has been done. Our priorities 
are determined by the need to respond rapidly and 
flexibly to the complex and changing information 
needs of the Museum of Scotland Project. We have 
not tried to make some services as sophisticated as is 
technically possible, or to create intellectual links 
between databases; we have opted instead to make as 
much relevant information as easily accessible as 
possible, in a flexible and responsive environment. 

A key element of this flexibility is regular com- 
munication between system users, colleagues in the 
information services, and management. We meet 
directly with individuals, and we have established 
user groups for some services which are attended by 
representatives from the project teams. We obtain 
valuable feedback, which we use to keep our strategy 
under continual review. We have tried to ensure, by 
means oftraining programmes, that project teams are 
aware of all of the resources available to them, and 
are able to use those they need. 


8. Conclusion 

The approach that we have described in this case 
study is essentially practical. Information is a key 
resource in any established museum; we have tried to 
demonstrate how it can be provided to support the 
dynamic and complex process of creating a new 
museum. We hope that our experience will be useful 
to others involved in similar projects. 
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Summary 

The trend away from development of fully automatic machine translation (FAMT) is the result of failure to develop 
the foundation level of machine translation (MT) systems design theory. In order to create this level and establish 
reliably whether FAMT is achievable or not we will have to revise our view ofthe inter-disciplinary approach. The 


paper ends with an assessment of the interdisciplinary approach as applied to date. 


I shall start with a rough sketch of the aim of this 
paper. Firstly, let me explain what I won't be talking 
about. I am not going to be passing on anything 
relating to Nissan — what follows is entirely my own 


point of view. I'm not going to concentrate on par- - 


ticular software packages. Nor am I going to be 
putting forward or talking about a particular theory 
of language. I am not an empiricist or a rationalist or, 
indeed, an advocate of hybrid systems. My aim is to 
describe what I feel to be a trend in the field of MT 
systems design today and then to provide my inter- 
pretation of it. 

I am aware that in the process I will be running 
the risk of foundering in dangerous waters. There is 
a prevailing *No theories, please, we're British' type 
of culture and, although I intend to steer clear of most 
ofthe theories which hover in the background of this 
topic, it is inevitable, in attempting to disentangle 
what lies behind the trend, that my conjectures will 


end up as theoretical as anyone else's. І hope you 


will bear with me. 

The ultimate aim of machine translation research 
was, until recently, to create computer systems which 
would simulate the whole range of human transla- 
tion activity and stand in for it when required. It was 
hoped that it wouldn't be long before we would have 
machines which would do this work automatically, 
without human assistance and, given the right train- 
ing, on anything it is handed — FAMT, in other 
words. 

The media have, of course, always loved this idea 
and so has the public because it seems like a hefty 


step towards the dream of the independent computer | 


with a mind of its own and able to communicate with 


humans as the completely fictitious HAL appeared . 


to be able to do in the film 2001. But the current trend 
in MT systems design is away from FAMT towards 
systems with more limited aims. These can be bro- 
ken down into three main types. 

First, there are the human assisted MT systems 
represented by the surviving general purpose sys- 


tems, for example, SYSTRAN. These systems are 
the descendants of the original attempts to achieve 
FAMT and they rely on human pre-editing of source 
texts or post-editing of target texts to make up for the 
system’s limitations. The necessity for this human 
activity was originally seen as a measure of a sys- 
tem’s failure to achieve fully automatic status but it 
has subsequently become institutionalized and freed 
from any hint of failure. 

Then, there is machine assisted human transla- 
tion: typically this approach is most often described 
as the ‘translator’s work station’ or workbench in 
which the translation work is done by the human 
translator aided by a range of software tools such as 
integrated dictionaries, parsers, editors, job handling 
software, modem access to technical databases, and 
so on. 

Finally, there are the controlled language appli- 
cations: these systems operate only in particular 
application areas, e.g. weather reports, production of 
technical manuals, stock markets reports and so on 
(the advantage of this approach to system design is 
you end up with an automatic system which is rela- 
tively error-free because it restricts the use of language 
toa fixed vocabulary used in a limited set of sentence 
structures to exclude ambiguity). 

Now this looks to me like a trend towards a more 
piecemeal approach to MT and it seems to be based 
primarily on the poor success rate of research into 
fully automatic systems over the forty odd years 
since this research started. Effectively, forty years 
just hasn’t been enough to crack the mystery of 
precisely what is happening when we produce natu- · 


` ral language or do translation on it. 


But what I find even more disturbing than this 
failure is the fact that no-one appears anxious to call 
a post-mortem. No-one seems to be willing to ask 
what has gone wrong. À number of the difficulties 
involved have been pointed out but no-one has pro- 
vided substantive reasons why these difficulties 
should be insoluble. Instead there seems to be a 
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rising swell of intuitive judgement that the task of 
developing FAMT is simply not achievable. 

The need to resort to intuitive judgement arises 
from the fact that there is no widely accepted founda- 
tion level of systems oriented research into language 
phenomena. And without a foundation level you can 
never even hope to make a reliable decision on 
whether or not FAMT can one day be achieved. 

So why was a foundation level never laid down? 
If this had been a question of computerising a com- 
mercial company, the systems builders would have 
been straight in there doing a full systems analysis of 
every aspect of the business before considering a 
single line of code. However, I think it is fairly safe 
to say that the earliest systems builders seriously 
underestimated just how complex the problem of 
simulating language phenomena is. But their approach 
has largely set the stage for later attempts. 

Also, of course, there has always been a large 
corpus of non-systems oriented expertise on lan- 
guage phenomena available in the form of scientific 
research. So, perhaps, it seemed only natural that MT 
research should rely on the work of more traditional 
disciplines to provide it with the descriptions of 
language phenomena on which to base its judge- 
ments regarding what is achievable and what is not. 
This, I feel, was a crucial mistake. 

The reason is that, unfortunately, no one scien- 
tific discipline covers the list of topics you would 
have to tackle to uncover the relevant information 
and very little of what is currently available regard- 
ing these topics is in a systems oriented form. 

To provide your design theory with an effective 
foundation level, you would need reliable systems 
oriented information from many different areas of 
research. You would, for example, need to know the 
detailed nature ofthe physiological mechanisms con- 
trolling language. You would need to know precisely 
why and how language evolved. You would need to 
understand fully the relationship between language, 
society and culture and the relationship between 
language, logic and problem-solving. And you would 
need to identify and explain the learning process 
which permits more than one language to be learnt 
by a single individual. 

Unfortunately, modern science still does not 
have solutions to these problems. What is worse, 
however, is the fact that anyone wanting to pick up 
the thread at the most recent point in the investiga- 
tion will face a daunting task deciding even where 
to start. These topics are dotted about the discipli- 
nary landscape in a chaotic tumble. This is largely 
because each of the disciplines involved has a tradi- 
tional structure centring round a set of issues and 
research aims which evolved separately from those 
in other disciplines and long before the advent of 
computers. 

As a result, these aims and priorities are, more 
often than not, pointed away from the areas which 
need to be investigated to solve MT problems. Physi- 
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ology, for example, has no real interface with sociol- 
ogy or anthropology so that it has to try to explain 
language in isolation and inevitably fails to achieve 
anything in the process. 

In these circumstances, we need to ask ourselves 
why it should have been imagined that the situation 
would change for the better when the disciplines 
combine. The only way an interdisciplinary approach 
to these problems might have worked is if the issues 
involved had been re-opened, unpacked and re-ex- 
amined from the new viewpoint. Unfortunately, no 
one has ever attempted to do this. | 

Now, this may seem like a very bleak view of the 
situation since it seems to confirm that the current 
intuitive consensus is correct and there is no way out 
of and beyond the current problems other than the 
fragmentation referred to earlier. However, I believe 
there is a way out if we are prepared to turn to the 
appropriate set of tools for the job and just such a set 
of tools is available in the ideas of the historian of 
science, Thomas Kuhn. 


The limits of the interdisciplinary approach 

a) The Kuhnian nature of the problem 

Kuhn, in his Structure of scientific revolutions, said 
that there are always limits to the questions which 
can be answered within the structure of knowledge 
at a given point of human history. These limits are 
imposed by the basic model of the universe pos- 
sessed at that point. He called this basic model the 
‘paradigm’. To find the answers we are searching for 
in such situations we have to look outside the para- 
digm and, if necessary, build a new one. So, what 
does he mean by a paradigm? 

The traditional view of the evolution of human 
knowledge is that it has been a steady piling up of 
experience. In Kuhn’s view, this is not the case. 
Knowledge doesn’t just accumulate in a steady lin- 
ear progression. It has also, more significantly, often 
gone through periods of relatively sudden change, 
e.g. as with the acceptance of Copernicus’ view of 
the solar system. In such periods, some central gen- 
erative idea changes or shifts. In the Copernican 
case, it was the idea that the earth and man’s place on 
it are at the centre of the universe. These central, 
generative ideas are what Kuhn calls paradigms. 
When they change, large amounts of knowledge are 
virtually dumped as irrelevant and previously incon- 
ceivable avenues of insight are opened up. 

With hindsight we can look back on the history of 
the interdisciplinary approach to MT and detect sev- 
eral Kuhnian features. 


b) Kuhnian features of MT 

Modern scientific disciplines have only recently modi- 
fied the severe territoriality which has customarily 
characterized the maintenance of interdisciplinary 
boundaries. In fact, the development of interdiscipli- 
nary teams working on the whole range of artificial 
intelligence (AI) problems (MT is just one of these, 
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after all) has always itself been acknowledged as 
something of a minor paradigm shift.. 

However, this has been a rather passive shift. 
Research has been forced down this avenue because 
(as already mentioned in the case of MT) no one 
discipline seems to provide a sound basis for tackling 
AI problems. The limited success of interdiscipli- 
nary research raises the suspicion that what we are, in 
fact, observing in it is a cosmetic process of papering 
over the cracks in a paradigm crumbling under the 
stress imposed on its inner limits. 

And all the signs indicate that a very strict limit to 
the human imagination is at work here. This is re- 
flected, for example, in an inability to conceive of 
appropriate basic architectures for the computer hard- 
ware needed to support simulations of biological 
phenomena. This applies even to the recently devel- 
oped neural network architecture which is regularly 
described as being based on the workings of the brain. 

Unfortunately, you won't be able to find a 
neurophysiologist to confirm this label because, al- 
though neurophysiology has made great strides in 
explaining how neural signals are generated in the 
individual neurone and what is involved in transmit- 
ting them across single neural clefts, it is still anyone's 
guess how they are integrated in actual movement 
and perception. We still don't know, in other words, 
how the brain works so it would be more appropriate 
to compare these networks to the action of a self 

regulating sieve. 

Perhaps the most significant example of the fail- 
ure of the interdisciplinary approach can be seen in 
its inability to redefine the traditional paradigm re- 
garding the relationship between movement and 
perception in the physiological organization of knowl- 
edge. Researchers have been aware for some time 
now that human knowledge or consciousness is de- 
pendent on a constant feedback relationship between 
movement and perception. 

But the traditional disciplines are still based on a 
paradigm which separates perception and cognition 
from movement and gives them a higher priority. 
This is essentially because the traditional disciplines 
have never completely divorced themselves from the 
task of explaining perception and cognition in a way 
which justifies our own society's view of the indi- 
vidual as intrinsically rational. 

This has several results. We still do not, for 

instance, have the kind of grasp of how the vertebrate 
system coordinates movement which would permit 
‘us to develop a computer simulation of vertebrate 
movement. Also, we are left with the impossible 
situation in AI where we are attempting to simulate 
highly flexible organic phenomena like language 
with computer architectures condemned to a primi- 
tive rigidity from which there appears to be little 
likelihood of escape. But perhaps the worst effect of 
all is that we end up having to poke and hope, to pick 
at the problems one by one and in virtual isolation 
from each other. 
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c) Consequences for MT 

In this situation, it is almost inevitable that no progress 
will be made towards the solution of fundamental 
problems like deciding on the achievability of FAMT. 
The only factor which makes the arbitrary separation 
of MT from the other areas of artificial intelligence 
seem acceptable is the way it appears so reminiscent 
of the interdisciplinary boundaries of traditional 
sciences. 

This, however, has the negative effect of making 
it more likely that, as topics, MT, intelligent knowl- 
edge based systems, robotics and pattern analysis 
will start to develop the mutual exclusivity observ- 
able in the traditional sciences which will increasingly 
block the cross-fertilisation essential to further 
progress. If the current trend towards concentration 
on short-term gains in MT and AI in general takes 
hold there is a danger that the situation will progres- 
sively ossify in this position. 

The long-term issues will only retain their vital- 
ity and significance if we are prepared to face up to 
and act on some sobering possibilities. We will, for 
example, have to acknowledge more openly that we 
have finally reached a point where a purely empirical 
science approach with its subjective analysis of ob- 
servable data is no longer enough and we have to ask 
ourselves: is it, perhaps, the assumptions we have 
been working on at the basic paradigm level that are 
blocking our path? Effectively, that is, we will have 
to recognize that we are at a critical turning-point in 
the development of our culture and that we have 
come up against its inner limits. 

Whichever route we decide to take out of the 
current situation we will only keep a grip on the 
realities of understanding the nature of language if 
we are willing to recognize the true scale of the 
problems involved. This is the main medium of 
communication for our species and, in an age when 
our knowledge is what we most pride ourselves in, 
we still do not know how it works. 

This a cultural Everest we are confronted with 
and we translators are its Sherpas, conscious of the 
deep mystery still lying within its slopes, watching 
with bemused curiosity the fluctuating energies, now 
failing, now in full flow, of the technically minded 
climbers exploring its surface. MT research may one 
day provide us with the oxygen bottles to reach its 
summit but it is unlikely ever to deprive us of a job. 

This is good news for translators and, maybe, for 
the human race. But it must make disappointing 
knowledge for anyone who feels that computerized 
language technology should have a sound theoretical 
base. It is also very frustrating for those like myself 
who would like to see revealed more clearly what 
language is and what it can still do for us. 

Roger Penrose recently went to great theoretical 
lengths to prove that computers can never be made to 
simulate natural cognition. I feel there is some irony 
in the fact there is a much simpler reason that com- 
puters cannot be made to simulate natural language 
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activity. The reason is simply that we still do not 
know what language is and are unlikely to find out 
what it is as long as we cling as fiercely as at present 
to our illusions about what it should be. 

Obviously, there is a cost-effectiveness argument 
for getting back to basics and sorting out the founda- 
tion level. We will be much clearer about how we 
target our resources after we have done so. But I 
believe there is much more at stake than that. 
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We are living in a time crying out for a new 
paradigm in our understanding of ourselves. Even if 
paradigm studies were to confirm that FAMT is 
either not achievable or not worth achieving, at least 
the attempt to locate the way out of the current 
paradigm might uncover new knowledge which 
would help us to cope more effectively with the 
problems that lack of self-knowledge inevitably 
entails. 


Aslib Proceedings, vol.46, no.3 


Forty ways to skin a cat: users report on machine translation 








Forty ways to skin a cat: 
users report on machine translation 


Veronica Lawson 
Translation consultant, 65 Walnut Tree Walk, Kennington, London SE11 6DN 


Muriel Vasconcellos 
Translation consultant, Washington, DC. 


Paper presented at Machine Translation Today, a conference jointly sponsored by Aslib, the Association for 
Information Management, the Aslib Technical Translation Group, the Institute of Translation and Interpreting, 
and the European Association of Machine Translation on the 18-19 November 1993, at the CBI Conference 
Centre, London. 


Abstract 

In probably the most extensive survey of MT use ever performed, some forty users have reported directly on their 
experience. This paper explores their responses. Many are favourable. What are they doing? Are they language 
professionals? What is MT for? 'PCMT' — affordable MT software for your desktop — has transformed the user 


profile. One vendor has sold over 200,000 MT packages at under $100. 


Introduction 

For fifteen years, since Xerox reported their new 
Systran installation at the first ‘Translating and the 
Computer’ conference іп 1978,) the user's report has 
been a mainstay here. The very title of the confer- 
ence series reflects that: we deliberately chose the 
concrete, practical ‘translating’ rather than the 
potentially abstract ‘translation’. 

This paper, however, concerns not one user’s 
report, but more than 40: 38 survey responses and a 
number of users’ testimonials. It deals first with 
probably the most comprehensive survey of MT user 
ever performed, in which the International Associa- 
tion for Machine Translation (IAMT) approached 
some 75 users. Half replied, a remarkable response 
rate. However, even 75 users are far from a compre- 
hensive sample. The survey was necessarily biased 
towards the language industry, and in particular it 
could not cover a myriad individual users. The big 
story in MT now is an immense expansion in PCMT': 
MT products for anyone’s personal computer. These 
do full-sentence batch translation, but at a price 
within easy reach of the ordinary person in the street. 
These conferences have never been concerned with 
such people, but it would be unwise to ignore this 
democratization of MT. The second part of the paper 
therefore deals with such users. 

A full report on the survey was given at MT 
Summit IV in Kobe in July, and has since been 
updated in the IAMT journal, МТ News Interna- 
tional. The present paper, while it draws on that 
report and on the raw data from the survey, comple- 
ments that report by quoting more from users. 


The IAMT survey of MT use 
The main survey was performed in June 1993 by the 
IAMT Secretary, Muriel Vasconcellos. The ques- 


tions (Table 1) had been devised some months before 
by Joann Ryan for a pilot survey of seven users. Five 
of these were contacted again in June (two having 
fallen by the wayside). In the main survey the ques- 
tions were then faxed directly to the 70 other MT 
users (or in some cases prospective users) for whom 
fax numbers could be obtained, and the responses 


Table 1: Survey questions 
System used? 
Since when? 
Language combinations (from, into)? 


Hardware platform? 
Since when? 


Form of input 
(e.g. disc, downloaded files, OCR, 
manual keying)? 


Purpose of translation? 


Type of texts translated — 
genre (e.g. ‘technical manuals’), subject matter? 


Output per year (number of words), 
percentage of total translation volume? 


Dictionary size (number of entries) 
for each language combination? 


Description of personnel who use it 
(e.g. contract translators, etc)? 
How many? 


Type and amount of pre-editing done? 
Type and amount of postediting done? 


System for incorporating feedback from 
end-consumers? 


Advantages, disadvantages of MT? 


News flash: 
latest developments? 
novel uses of MT? 
plans for the future? 





Aslib Proceedings, vol.46, no.3, March 1994. pp.83-87 


Forty ways to skin a-cat: users report on machine translation 








Table 2: Summary of MT use by survey respondents (a) 


Estimated no. of 
words/per year (b) 


User | Yearof 
# startup 


11,250,000 
17,000,000 
9,00,000 
2,500,000 


30,000,000 


10,000-100,000 
25,000,000 
10,000,000 


4,500,000 
1,600,000 


2,500,000-3,000,000 
44,000-60,000 


750,000-1,000,000 
2,500,000 


(c) 3,445,000 
2,000,000 
480,000 
350,000 


1,600,000 
375,000 


45,000,000 
1,500,000 


(d) 345,000 
25,000 
3,300,000 


Percentage of 
total trn, volume 








Type of texts machine-translated 
(genre, subject) 


Sci/tech. articles & documents (17 fields). 
Weather bulletins. 
Service & customer documentation. 


Variety of general & tech. document types & fields 
(public health, agriculture, management etc). 


Low-level in-house docs. & correspondence, recurring 
document types, expert group reports, rush decisions on 
minor matters, screening for previous translations 
(administration, finance & economics, agriculture & 
many other topics). 


Technical manuals. 
Technical service publications. 
Technical manuals. 
Technical manuals. 


Engineering-based applications (documentation & 
software) & some hardware documentation. 


Technical manuals (mainly computer-related). 
Technical manuals (computer-related). 


Technical: repetitive descriptions, software source codes, 
data sheets, lists etc. 


Technical manuals. 


Japanese subtitles for TV news in English. (Light news 
topics like ecology, animal-human relationship etc). 


Internal technical information via e-mail: memos, reports, 
letters, minutes, internal newsletters, technical sheets. 


Group insurance & pension contracts, employee booklets 
on insurance scheme. 


Titles & abstracts in JICST database. 
Online & hardcopy software documentation. 
Technical manuals (switching systems). 


Technical manuals (microcomputers,automobiles, other 
machines & products). 


Technical manuals (process control system). 


Manuals, technical reports for end users (mainly informa- 
tion technology & telecommunications). 


Chem. abstracts, reports, data sheets, guidelines. 
Technical manuals, computer software. 


User instruction manuals for software applications 
(accounting, finance, media). 


Titles of unexamined patent applications. 
Scientific publications, manufacturing documents. 


Technical manuals/papers, questionnaires, lists, forms, 
reiterative texts (eg Consumer Price Index), phone books, 
catalogues. 


Computer manuals. 


a) Eight of the 38 respondents did not provide the information being compared in this table. 

b) Figures for numbers of pages were multiplied by 250 to permit comparison. Those for less than a year were annualised. 

c) 85,000 titles plus 15,000 abstracts; average length of title estimated at 10 English words and of abstract at 150 English words. 
. d) About 23,000 titles per year, estimated at an average of 15 English words each. 

e) 90% of the abstracts are written in English by bilingual abstractors; of the remaining 10%, all (100%) are translated by MT. 
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were faxed directly back. 33 current MT users 
responded, giving a total of 38 responses from a 
sample of 75. Sixteen were in the USA, 11 in Japan 
and 11 in Europe. 

Not included were survey responses from seven 
prospective users. Of these one was due to start 
MT the following month, one had called for bids, 
and the remaining five were performing pilot tests 
or feasibility studies. This group included 
CompuServe, whose online forum and e-mail serv- 
ice were to offer English-French MT from this 
autumn, with other languages to follow: *potential 
volume is 30 million words PER DAY!’ (their 
emphasis). 

The 38 current users included most of the known 
large users. They had 17 different MT systems. Four 
had two systems, making a total of 42 systems. 
(Some, of course, also had more than one language 
direction, and /or the same system on more than one 
site.) Of the 17 different systems, 12 were commer- 
cially available: there were eight users of SYSTRAN, 
six of METAL, four of DP/TRANSLATOR and two 
of its forerunner, Weidner's MicroCAT, four of 
Sharp's DUET, three ofLOGOS, two each of Fujitsu's 
ATLAS, Hitachi's HICATS and NEC's PIVOT, and 
one each of Linguistic Products! PC-TRANSLA- 
TOR, GENERALE-TAO and CATENA. The 
remaining five had been developed for in-house use: 
METEO at Environment Canada, SHALT at IBM 
Japan, a CATENA-based system at NHK (Japan 


Broadcasting Corporation, SPANAM/ENGSPAN - 


at the Pan American Health Organization, and the 
JICST system at the Japan Information Center of 
Science and Technology. (Another vendor, Winger, 
sent user contact details for the survey, but too late 
for inclusion.) 

Thirty users gave figures for the volume of MT 
and/or for the percentages of MT in their total 
translation volume. Table 2 summarizes the use of 
MT by these 30 respondents, on whom this report 
concentrates. 

25 of these gave MT volumes, totalling some 180 
million words/year. Volumes ranged from 25,000 to 
no less than 45,000,000, with 18 of the 25 quoting 
millions. 

24 of these gave percentages of MT in their total 
translation volume. These ranged from 5 to 100%. 
Of the five users who machine-translate more than 
10 million words/year, all but one quoted high per- 
centages. The 25 million words of manuals 
machine-translated by the Canadian translation com- 
pany Lexi-tech represent 100% of their volume; the 
17 million words of weather bulletins at Environ- 
ment Canada, 85%; the 11,250,000 words of scientific 
and technical articles and documents at the United 
States Air Force, 80%; the 45 million words of 
manuals and software at Bull, 5096; and the 30 
million words, mainly of low-level in-house docu- 
ments, at the Commission of the European 
Communities, 15%. 
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All uses stated in these thirty responses are listed 
in Table 2. About half of the respondents cite manu- 
als; these, not surprisingly, are the commonly text 
type cited. The other text types vary widely (see 
Table). They include reports, abstracts, correspond- 
ence, subtitles for television news, patent titles, 
insurance contracts, employee booklets, lists, cata- 
logues, questionnaires, newsletters, phone books, 
etc. The number of fields listed is also large. 

Most respondents have installed MT in the last 
five years: 73% of the 30 in Table 2, and 82% of the 
full responding population of 38. This indicates a 
notable degree of expansion of MT use. 


MT on the Clapham omnibus 

The expansion is gaining speed with greater public 
awareness of PCMT. Over a dozen companies now 
sell PCMT in the United States. Between them they 
offer 17 different language directions, with more on 
the way. More systems, too, are under development. 

By December 1992, when the US WordPerfect 
Magazine’ balloted readers on their favourite 
WordPerfect-compatible software, no less than 7865 
readers voted in the MT category, and should there- 
fore arguably have at least tested an MT package. 

The three favourites were Linguistic Products’ 
PC-TRANSLATOR, MicroTac Software’s LAN- 
GUAGE ASSISTANT series, and Globalink’s GTS. 
Linguistic Products, who offer 12 language combi- 
nations, report sales as doubling annually since they 
came to market to 1985. Globalink, offering seven, 
were floated on the stock exchange in June 1993, 
with a prospectus claiming over 13,000 products 
sold since January 1990, from $299 to $998 retail. 
Above all, sales of MicroTac’s four bidirectional 
packages, which sell at under $100, doubled from 
100,000 in November 1992 to 200,000 by August 
1993. 

This trend is encouraged by the popular compu- 
ter press. As I was writing this paper, both IBM’s 
Helpware Magazine’ and WordPerfect Magazine® 
fell unsolicited on my doormat, both bearing articles 
оп PCMT. 

The archetypal ordinary person in English law is 
‘the man on the Clapham omnibus’. When lawyers 
want to establish what is reasonable to the ordinary 
person, they conjure up the London commuter on a 
big red bus, the picture of normality. Now, perhaps, 
he is sitting there on the No. 137, literally performing 
MT on his laptop or palmtop. 

Perhaps all too literally. We know that only too 
well, but do these PCMT users know it? Do they 
know the foreign language they are working with? 
Who are they? 

Such users are not easy to identify. Client lists are 
usually confidential. Some information may be 
gleaned from testimonials, however, of which eight 
were passed on by two PCMT vendors. In particular 


they remind us that translation as we know it is not. - 
the only kind. These testimonials are not to be con- 
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fused with our survey responses, not least because 
they do not come directly from the users. However, 
they look genuine, and cast some light on who does 
what on the Clapham omnibus. 


Testimonials 
Parts of all eight testimonials follow, with original 
spelling. The first concerns Linguistic Products’ 
PC-TRANSLATOR, and is from a bilingual secretary 
in Maryland, in the international marketing division 
of an electronics multinational: 
We have been successfully using your Span- 
ish to English and English to Spanish translator 
softwares for a few years and would like to 
have the updated version... Iam sure that your 
new package is of superior quality, as always. 


The remaining seven testimonials are from users of 

MicroTac's Language Assistant series. First, an 

American in Paris: 
I've used it (French Assistant) a little in 
translate mode, like the day there was no 
hot water in the apartment I’m renting and 
I had to go check with la gardienne. I cre- 
ated a file with the basic questions 1 wanted 
to ask, each one expressed two or three 
ways with lots of complete clauses, simple 
sentences, etc. I was able to get some half 
decent sentences with a little tweaking and 
patience. I practised pronouncing the sen- 
tences a little bit and went down to knock 
on the office door. Normally I would have 
printed out the results and carried a page 
along as a ‘cheat sheet’ but my printer was 
out of order. I put my notebook on battery 
power and carried the PC along with me. 
Turns out that *La Gardienne' was out and 
her high-school age daughter came to the 
door. I guess I should have spent more 
minutes on the pronunciation practice be- 
cause the noises I was uttering left la fille 
de la gardienne looking perplexed. At that 
point I flipped up the display on the PC and 
held it so she could see the screen as I 
scrolled through my questions. Voilà. I was 
not the only one with water problems. The 
boiler was being repaired and the whole 
building was suffering along with me. I 
said ‘merci’ and returned to my apartment. 
Mission accomplished. I only wish I could 
have been a French fly-on-the-wall later 
that evening as ‘la fille’ told ‘la mére’ what 
the crazy American on the 7th floor did that 
afternoon. Well, what are tourists for if not 
to amuse the French citizenry. 


Secondly, the Catholic chaplain at an East Coast 
university: 
Last winter holiday I was asked at the last 
minute to be the Catholic Chaplain for the 
French speaking passengers on the Paquet 
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(French Cruise line) cruise/expedition to 
Antarctica when the priest from France had 
to cancel at the last minute. I was able to use 
the translation capabilities a few times when 
I had to quickly come up with a sermon. I 
sent through from English to French and 
then did the polishing myself. 


Another is from a management and financial con- 

sultant in Texas: 
I wanted to take this opportunity while or- 
dering my Spanish Assistant 5 upgrade to 
offer my compliments on the quality of your 
product. My company deals extensively with 
Mexico and throughout Latin America and 
this software has proved very valuable. I 
have installed the program on my notebook 
which I carry on my travels throughout Latin 
America. Since I am at best marginally pro- 
ficient at speaking Spanish, this program 
has been of great assistance. 


From Ontario comes the following: 

I felt that Imust include this letter along with 
the registration card. Although not perfect 
(nothing is), your Spanish Assistant version 
5 is God sent. I have been using it for about 
three months. I have learned how to write 
English documents that the Assistant can 
translate into Spanish quite well... 

How do I know that the Spanish produced 
is good? Well I have a mate who is from 
Guatemala, С.А. His English is not the great- 
est. Sometimes during discussions in English, 
we do not understand each other. Situations 
arise when I know that I am not being under- 
stood. This is when we need to use the 
Spanish Assistant to communicate better, 
Wow! What a life saver. After reading the. 
output, 1 am understood completely. I also 
ask him if the translation is grammatically 
correct. In most cases, most of the document 
falls within acceptable limits. 


À letter from a business in California reads: 
‘I installed this program last night, and I was 
amazed at how well it worked. I can barely 
speak Spanish, and I certainly couldn’t write 
in Spanish. I do understand ‘street’ Spanish. 
Thanks to you, I sent my first Spanish 
‘memo’ by fax last night. This is a wonderful 
program. I couldn’t believe how well it con- 
verted some of my letters and memorandums. 


Contrast a Spanish linguist, now vice president and 
quality assurance director for a subsidiary of a multi- 
national bank: 
I now seldom have the opportunity to use the 
skills for which I was hired by this corpora- 
tion — Translator of Legal and Commercial 
Documents and Training Manuals: Spanish, 
French, Italian and Portuguese to English... 
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I’m thankful for and very satisfied with 
your Spanish Assistant. Not only does it cut 
dictionary looking-up time considerably, but 
the capability to expand or modify defini- 
tions is a welccmed flexibility. It's truly a 
translator's dream come true.’ 


The last testimonial is from a professor in Massa- 

chusetts: 
Through your product (Italian Assistant), I 
have been able to correspond with my rela- 
tives in Italy since my trip in 1990 when I 
was introduced to them for the very first 
time. You should know that at the age of 57 
one thinks very hard about beginnings and 
endings. Holding on becomes very impor- 
tant especially when the only two vital links 
were disappearing: namely, my dad who 
passed away two years ago and my mom 
who is too old to write. 

Having never been formally schooled in 
the Italian language and trying to recall the 
dialect of my youth makes for awkward 
communications to my overseas family. Can 
you imagine my excitement when I 
accidently discovered your software pro- 
gram? Talk about dying and going to 
heaven!... 

Thank you for giving me back the other 
half of my family. 


This evidence from testimonials is at best anec- 
dotal. However, experience suggests that it may be 
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representative of satisfied PMCT users, if not the 
PCMT user population as a whole. In particular, it 
seems safe to infer that not all PCMT users are 
language professionals, or know enough of the foreign 
language to judge whether the MT is good. But can 
they judge whether it is useful to them? In most cases 
they probably can. There is, after all, more than one 
way to skin a cat. 
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Abstract 
This paper will look in turn at: the place and flow of terminology in the corporation; the attenuation that terms are 
subject to in the course of that flow; some of the reasons that meanings get altered while words progress from user 


to user; what actions may help the speedy and correct dissemination of the intended message. 


Significant words flow around a company 
We are not here concerned with those we use in 
everyday life, when the everyday meaning is obvi- 


ous to all of us. ‘Please wash up the coffee cups’ is 


readily understood. To some ‘a cup washer’ might be 
a specialized member of the staff of a restaurant. To 
others it may mean that domed edged disk much used 
by carpenters and joiners. Note please not just the 
physical description of the problem item, but a delib- 
erate introduction of ‘context’ by saying who used it. 
But who invented that specialized term? 


The role of inventors 
This age of rapid technological advance spawns, 
probably daily, entirely new artifacts. We can’t talk 
about ` 
the process of metal removal in which 
electrolytic action is used to dissolve the 
workpiece metal 
so production engineers speak of ‘electrochemical 
machining’ or ECM. Some of these new terms come 
from the universities, but these days as much comes 
from the industrial research and development labora- 
tories. All kinds of widgets need some words to 
speak explicitly of them. 

Perhaps the lab has passed to the production 
people a computer network device which they de- 
scribe as having the following new features: 

a router 
hardware flow control 
a new point-to-point protocol. 

A dictionary may imply that production are going 
to have to make a shaping tool for joiners, obtain 
some domestic plumbing fittings for the bathroom 
and will have to set up a racecourse in the country for 
the local nobs’ hunters. 

Once they have mastered the production engineer- 
ing involved, and the product is in a manufacturable 
state, a new group becomes involved. 


The technical writers 


Being somewhat more literary, they will be wres- ` 


` tling with pretty words for: 
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the token ring 
an asynchronous host 
the multiple server network. 

The concept that an engagement ring is a token of 
esteem can be mastered readily, and perhaps the host 
at the engagement party was not able to synchronize 
the tipping of the bottle with the placement of the 
glass, but the rules of a game of tennis where special 
nets are needed to field the services of several oppo- 
nents are giving difficulty. However the service 
manuals and the user guide get written, and it is all 
handed over to: 


The training department 
Now this department doesn’ tactually do any training. 
Its role is to produce more documents, and probably 
computer material, with which the distributors can 
train the actual users. They understand they have to be 
meticulous people, so the proper meaning of: 

disk de-fragmentation 

transparent user access 

mouse signal traffic - 
have to be precisely transmitted. They do see diffi- 
culties in explaining how to apply super glue to 
broken vinyl records reliably, particularly if the po- 
tential users are hard to find, having already taken on 
board the techniques of H G Wells’ invisible man. 
They have assumed that travelling mice leave small 
dark messages, now to be called signals, indicating 
their route, which may lead them to the users. 

Now we have working another literary group: 


The advertising and publicity staff 
Actually these are the first we have come across who 
do not see that what they write has to be right. (Note 
the homophone.) A product that needs the user to 
take a fortnight's course at £250 a day before he gets 
a glimmer of what it is all about, needs to be pre- 
sented as dead simple. They latch on to 

a chat window with paging facility 
as they seem to understand it, and so the ignorant 
public is bound to be able to. 
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However 
multivendor interoperability over 
synchronous links 


sounds so good it has to go in untouched, unex- 
plained. 

So the product is ready to launch. Ha Ha! The 
marketing plan says ‘World-wide’, doesn’t it? And it 
all starts all over again in seventeen European lan- 
guages, let alone the rest of the world. If translation 
is separately organized for each country, as does 
indeed happen, there are another seventeen depart- 
ments needing to understand what they have been 
sent, 

To recapitulate the course of information, look at 
it as a flow chart. (Fig 1.) 


Development 





Figure 1. 


An indication of the result for the user is summed 
up by Mark Whitehorn writing in PC User 


I’m closer to being a network freak than 
most, yet I find some networking documen- 
tation impenetrable . . . 


My interpretation of some of the terms I have 
quoted may be far fetched, but the fact that there is a 
problem at all, is genuine. We need then to look at 
possible reasons for misunderstanding. 


Novelty 

Companies try hard to steal a march on competitors 
by adding a completely new device to the latest 
model. Occasionally there is invented a completely 
new machine, for example a video cassette recorder. 
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That has to be called something. One group has 
reduced that five morpheme term to VCR, another 
uses ‘video’, which is certainly easier to say, even if 
the etymological root is largely lost, and so there is 
no innate clue to the meaning in the term commonly 
used. It is this lack of meaning to which I am trying 
to draw attention. The chemicals business has the 
same difficulty. Someone invented 
2,2-di(p-chlorophenyl)-1,1, 1 -trichloroethane 

which is scarcely a good name for an insecticide. I do 
not know who called it DDT, but it does not seem to 
me a very meaningful name either, even if we would 
rather not mention it at all on ecological grounds. I 
strongly feel that the use of initial letters is not to be 
encouraged. | 

In contrast the recently developed videophone 
has an accepted name that is readily understood from 
its root morphemes. 

Itis obvious that a newly invented word, or a term 
comprising common words in a new noun phrase, 
needs explanation, but a word given a new ог re- 
stricted meaning may well need to be explained. By 
now most of us can recognize when ‘mouse’ is not a 
rodent, but are we sure what ‘socket’ means in client- 
server technology. 

Devices are not the only concepts that need new 
terminology, even if we extend ‘device’ to include 
actions such as procedures, (‘flow control’), collec- 
tives like ‘style sheets’, and similar intangibles. There 
are some verbs that get re-adopted, even intransitive 
ones. We saw ‘to chat’ earlier. Adjectives too get 
new meanings. ‘Transparent’ conventionally implies 
the use of vision, and when it doesn’t it means 
apparent when perhaps the person you are seeing 
through would rather not be giving himself away. In 
computing, however, it is reversed to mean that the 
user is ‘unaware’ of intermediate processes, which 
themselves set out to make themselves seen through. 
In fact there may not even be a user involved at all; it 
all happens ‘by mirrors’ as the wits say. 


Geography 
Ш chosen words by themselves are not the only 
reason for difficulty. If our development staff had 
been in California, and the tech writers were in 
Europe, they would not be meeting over coffee. 
Electronic mail is certainly helping, but it is some- 
times difficult to express not just that one does not 
understand the words given, but that the construction 
is unfamiliar. People in different countries, even if 
they nominally speak the same language, can misun- 
derstand each other about everyday needs, let alone a 
novel technology. Let us consider the word ‘through’. 
An American telephone operator asks if you are 
through if she thinks you may be finished, but her 
English counterpart is only asking if he has managed 
to connect you. 

The Oxford English Dictionary-offers two main 
definitions of ‘through’: The first, ‘from end to end’, 
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makes the Americanism ‘from Monday through 
Thursday’ perfectly valid, but many English, expect- 
ing to hear ‘from Monday to Thursday’, follow the 
example quoted in the OED, ‘through the gate’. To 
them, ‘out the other side’ is implied and so they 
would be happy to go through Thursday to Friday. 

Translators are not usually native speakers of the 
source language, so no one can complain if they 
cannot discern differences of this sort. 


Personnel 

The timespan of development and life of a product is 
enormous, Ten years is typically the time to design in 
the aerospace industry, and then the sales go on for 
ten more. By the standards of the UK computer 
industry, each post in the development team will 
have had an average of four incumbents. Continuity 
goes. The original documents are now being written 
by new people with new backgrounds and literary 
style. The readers, having changed, no longer have 
the accumulated experience that came from asking 
questions, and indeed from having got it wrong here 
and there and been rudely acquainted of that sad fact. 
The problems of novelty are going to be repeated. 


Documentation systems 

Problems are not confined to people. Documents are 
expected to be transferred and read by machines. 
Some people are none too bright. Machines are all 
dead stupid. What might have been sorted out by an 
astute reader will probably never be noticed by a 
machine. 

One would hope that a corporation settles on a 
word processing or electronic publishing system to 
be used worldwide. Nobody here needs telling that if 
they don't there are going to be some delays and 
losses in conversion. However here are a few good 
reasons for that not to happen, at least in the short run: 


the company may have expanded by take- 
over, and the new offshoot was using 
something else 


the system previously used was really rather 
old fashioned, and perhaps competition is 
presenting much smarter material which has 
to be bettered 


the present system just cannot cope with the 
extended, or indeed completely different, 
character set for the languages of the coun- 
tries now being targeted 


the material may have to go to an outside 
agency. 

Given the real probability that different EP sys- 
tems are going to be seen, what is likely to happen? 
Ignoring possible losses of less used characters, like 
binding spaces, whose internal representation dif- 
fers, we may see some linguistics related problems: 


Effects of typesetting tags. The original material may 
have embedded in it some characters to change font. 
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The corpus analysis and computer aided translation 

systems may have been set up to hide the non- 

language parts of this: 
<P24M%-2>V<%2>ALUE<%0> is 
100<F128>W<F255P255D> 


which is to display as 
VALUE is 1009 


The V and the A are kerned together, and the tags 
would prevent a dictionary lookup system from find- 
ing ‘value’. There is an example of the font change 
problem too. 


Index entries. An author may have embedded an 
index entry inside a term, perhaps to ensure it returns 
the right page number from a long sentence. 


When using the point to <$Iprotocol;point 
to point>point protocol . . . 


‘Point to point’ in the text is interrupted, and may not 
be seen by the available dictionary lookup system. 


Termbankaccess. If the author's publishing system is 
to be used, it may not be able to access the dictionary 
or terminology system in which all the information 
and guidance that has been built up is locked. This 
may be because the hardware and operating systems 
are different; Unix on Sun perhaps instead of MSDOS. 
Even within MSDOS Windows may be required for 
the termbank, and the word processor does not sup- 
port object linking and embedding. 

These examples refer to looking up in databases, 
but of course it is just as important to be able to add 
to them. This will be an equivalent problem. 


E 


а= 


Figure 2. 
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Problems and Feedback 
The difficulties we have looked at have to get resolved. 
This is done by asking questions. In a very small 
company, working only in one language, this happens 
face to face or over the house telephone. The asker 
gets a positive solution. A company does not have to 
be enormous for this to be impracticable, Letus re-look 
at the information flow chart, fig 1. Fairly simple. 
Now add the lines of communication that could 
be set up for query resolution, fig 2. 

Even if the asker could identify the right person 
at the end of each problem link, the latter does not 
want to be at the beck and call of all the translators in 
all the languages, as well as the various documenta- 
tion teams that have written the original material. 
Anyhow he may be in bed when the call comes in. 
Some form of central fielder of questions is essential. 
The flow chart can then be somewhat simplified to 
fig 3. There are probably two levels of information 
provider needed, one near the developers, and the 
other to fan out the news to translators, who will 
inevitably be asking the same question many times. 


Management role 

It is management’s responsibility to deliver the prod- 
uct at the best cost, on time, and of the highest 
quality. All the difficulties we have been discussing 
will impact those objectives. Take them in turn. 


Cost. Time is money. Engineers being diverted from . 


designing to answer questions are costing money. 


















he y 
Fade | 7 





Figure 3. 
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Translators having to ask questions are costing them- 
selves money if they are on fixed price work, or the 
client money if paid by the hour. 

The infrastructure to support mail systems may 
be lost as an overhead, but the marginal cost of heavy 
query traffic is still money. Programming, and per- 
haps a query database, to lay on an easy-to-use query 
system for translators to.use costs money. 


Time. Delay can cost even more money, and possibly 
the raison d’étre of the corporation. If the product is 
made available to the sales force late, they lose sales 
to competitors. Those sales are never recovered. If 
the product is much too late, the company may never 
recover its market share, and falter. The money we 
are talking here is much more than the internal costs 
of query support. 

Quality. The product has got to be right. The docu- 
mentation of a product, (and of course the words in 
the user interfaces of systems products), are a part of 
the product. It is not just the boxes. There is foreign 
language documentation in the field for well known 
products which is actually wrong. I do not mean that 
it has not been expressed as well as it might, I mean 
the instructions are wrong. Part of the reason for this 
is that the unfortunate translators do not understand 
what they are doing. A poor product does not get 
repeat sales. 

The kinds of failures being described are not easy 
to cost. All the activities tend to get lumped together 
in the primary activities of the participants. A trans- 
lator does not make a timesheet entry every time he 
or she buzzes off a mailnote. But management does 
need to have an indication of what is happening. If 
they insist on seeing, and evaluating, reports from a 
query control system, they may see the scale of their 
own difficulties. 


Query control 

Queries must be counted. If all questions are routed 
through a central response point, this is simple. If the 
questions always come on an electronic mail system, 
analysis can be automatic. 

Some questions will have nothing to do with the 
words in the material. There will be problems with 
the publishing system, or data lost in transfer, and so 
forth. The bulk will however be questions of under- 
standing. A few of those will be a need for help on the 
grammatical structure of the original, and similar 
problems with style. The rest, and easily the main 
part, will be a need to understand the terminology. 
This has to be brought under control. 


Terminology control 

If problems are ascribed to woolly understanding of 
the terms in the company documents, those who 
generate them are going to have to explain them, and 
then stick to those terms in the same sense and not use 
synonyms. There are less and more extravagant ways 
to do this. The right one is really only to be identified 
after some analysis of the problem. They include 
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‚„.С1овзат1ев. ‘Simple lists of the new terms in a docu- 
ment ought to be any author’s habit. In a larger 
organization different people ought not to define dif- 
ferent meanings for the same words. Sometimes this is 
inevitable, as many words have accepted meanings in 


the context of a technology, and a different one in a 


telated one. A central company glossary would not 
only reduce the chance of redefining a term, but also 
show the environment it was created for. 


Term banks. The printing, distribution and mainte- 
nance of glossaries is itself a significant cost. From 
the quality viewpoint the maintenance is the most 
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significant. If a developer cannot broadcast he has 
just invented a super multi-synchronous widget 
someone else will, and their readers will not have any 
idea what they are talking about. A dynamic corpo- 
rate termbank seems to be the only solution. 


Of course we have already mentioned the inter- . . 


facing difficulties this may entail, but forewarned is 
forearmed. 


Translators — Urgent 


Workhard on your managements to get something in 
place. In 1994. 
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Abstract 


The Internet, a network of computers with distinctive software and hardware, interconnects millions of people 
world wide and offers tremendous amounts of information. Translators may benefit from the Internet throughout 
the translation process. The Internet is growing so fast that trying to find the right information is like looking for 
a needle in a haystack. This paper discusses some of the benefits of the Internet for the translator; and then points 
out various tools and guides which can be used to get the most out of the Internet. 


Introduction 

Anything written or said about the Internet should 
start with a disclaimer. A self-organizing network of 
networks, the Internet is so volatile that whatever is 
said about it today is bound to be obsolete tomorrow. 
If you use the Internet to sharpen your hacking skills, 
you will appreciate its fast pace, the thrill ofthe hunt, 
as you try new tools and tricks to delve even deeper 
into cyberspace. The hacker's thrill is the novice's 
nightmare. Like a tourist in a foreign land where 
people speak a strange language and road signs make 
no sense, the novice will feel utterly lost. Without a 
good guide, that is. This paper cannot provide such a 
guide. There are many of them already, serving 
different people with different needs. It will attempt 
to provide enough information on guides, sources, 
and the possibilities of the Internet, however, to 
allow translators eager to exploit its potential to get 
off to a good start. 

The Internet received its name from the fact that 
it connects a variety of computers with distinctive 
software and hardware. These computers are inter- 
networked to allow them to communicate by 
translating messages into a mutually understandable 
language referred to as communications protocol. 
The protocol used by the Internet is called TCP/IP. 
The first true Internet was Arpanet, a military 
network established in the United States in the late 
sixties. The following figures illustrate the 
phenomenal growth the Internet has experienced in 
the past decade. In 1981, 213 computers participated 
in the Internet. As of August 1993, the Internet 
comprised 15,160 different networks supporting at 
least 1,776,000 host computers in 60 countries.! 
The number of users is estimated to grow by a 
million every few months. A conference on the 
Internet which attracted some 200 people in 1983 
recently had between 50,000-60,000 visitors in San 
Francisco. 


This vast interconnection of computers and the 
great popularity the Internet enjoys provide an ideal 
infrastructure for sharing resources. The Internet 
allows users to transfer files between incompatible 
computers, send messages across the globe, and log 
into databases thousands of miles away within the 
blink of an eye. There is no central agency to co- 
ordinate the Internet. Each of its subnetworks is 
administered separately, and support tasks are 
accomplished by co-operative arrangements. Major 
Network Information Centers, NICs, provide support 
to users by making documents, other information and 
various services available to them.? Other agencies 
co-ordinate Internet design, engineering and 
management. Internet networks within each country 
are organized in hierarchical order into national, 
regional, and local networks. Universities frequently 
actas local hubs to provide Internet access to secondary 
and primary schools as well as the public in general. 


Internet access 

Before going on to discuss the uses of the Internet for 
the language professional, a word on getting access 
to the Internet is in order. You may distinguish 
between direct and indirect access. Indirect access 
refers to electronic mail which may be sent to and 
from the Internet from other networks through inter- 
connections, called gateways. Users of CompuServe, 
FidoNet, and national networks may send messages 
to the Internet, participate in discussion groups and, 
to a limited extent, access databases. To take full 
advantage of the possibilities offered by the Internet, 


.direct access to a computer connected to it is re- 


quired. Universities, organizations, and large 
companies frequently offer access to the Internet to 
their affiliates, In the United States, some institutions 
have made it their policy to offer Internet access to 
anybody who asks for it. Other institutions make 
accounts available to the public for a moderate fee. 
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In Europe, finding convenient access to the 
Internet is a little more involved. If accounts are not 
available through the computer centre of a university 
near your home, access is possible through national 
Videotex providers. Videotex access still carries a 
hefty surcharge of $15.00 or more per hour in addi- 
tionto telephone and other charges which may accrue. 
Complicated and expensive access procedures, along 
with a lack of awareness, seem to be the main reasons 
why the Internet has not really caught on with trans- 
lators. As the Internet continues to grow and gain in 
popularity, the number of private companies provid- 
ing competitive Internet accounts will increase and 
facilitate access for those not affiliated with large 
institutions. Please refer to the appendix for a short 


` list of commercial Internet service providers. 


Services available on the Internet 
The following services are available on the Internet: 
electronic mail, file transfer protocol, and telnet. 


Electronic mail 


Electronic mail, hence e-mail, is — to send mes- 
sages to other people or programs on other computers. 
Itis the most extensively used service on the Internet. 
Through gateways, it is possible to communicate 
with the Internet from other networks such as 


. CompuServe, FidoNet, or Bitnet. Please refer to the 


appendix for a short discussion of addressing con- 
ventions. 

Electronic mail is useful throughout the various 
stages of the translation process. Finding work is one 
of the many things e-mail facilitates. Job notices for 
translators appear regularly on several discussion 
groups. Unsolicited advertisement of your own serv- 
ices on.special interest groups (SIGs) violates e-mail 
etiquette, however, and may result in a strong reac- 
tion of other group members. Some translators have 
started to use the Internet to communicate with their 


clients and receive and send translations. As the - 


commercial side of the Internet develops, it may turn 
into a medium of choice for translator-client commu- 
nication, At this point, using accounts provided by 
universities and other non-profit institutions to carry 
out private business is considered unethical and may 
result in the loss of user privileges. 

The most valuable resources available to transla- 
tors through e-mail are special interest. groups. 


Whatever the topic, there is a good chance that it will 


be discussed at one or more SIGs. On Bitnet, these 
groups are referred to as LISTSERVers, on UNIX 
platforms they are called USENET newsgroups. 
LISTSERVers, named after a computer program 


‘designed for the mass distribution of mail, are more 


easily accessible from outside the Internet than 
USENET newsgroups. The program keeps a list of 
all members of the group. When a member posts a 


- message, it is redistributed to the mailbox of every 


subscriber. Messages posted to USENET newsgroups, 
on the other hand, are not redistributed to all mem- 








bers. Only one copy of the message is stored on the 
host computer. To read a message, a user must have 
access to the host where it is stored. Because they are 
more easily available, LISTSERV groups will be 
discussed in greater detail. Most groups archive their 
messages in a database, and it is possible to query the 
archives via e-mail to retrieve old messages by topic, 
data, keyword, or author. One group, Lantra-L, is 
devoted to translation. Insoft-L deals with software 
localization issues. There are many other groups 
across a wide spectrum of fields that the translator 
may find useful as a source of information. To re- 
ceive a list of available LISTSERV groups, send an 
e-mail message to LISTSERV@BITNIC.BITNET 
with following text: LIST GLOBAL. You will re- 
ceive a rather large file in your mailbox listing all 
discussion groups of which the LISTSERV program 
is aware. To receive help for using LISTSERVers, 


. send an e-mail message to the address indicated 


above with HELP as text. 

SIGs may be very helpful in locating informa- 
tion. If a translator has difficulty understanding a 
term or a concept, a message sent to the right group 
may provide a speedy- answer. Most groups unite 
experts in the field from various countries, and since 
many of them rely on the Internet for help, they are 
usually most willing to provide it to others. Without 
the help ofthe computer it may even be impossible to 
find an expert. In addition, it is much easier to posta 
note requesting help on a discussion group than to 
pick up the phone book and call somebody whom 
you have never met to ask for a favour. SIGs also 
provide a good forum to exchange information with 
colleagues around the world and to stay abreast of 
the latest developments in the field. 


File transfer protocol 
File transfer protocol, hence ftp, requires full access _ 
to the Internet. It makes it possible to transfer files 
between two Internet computers easily and quickly. 
Depending on network traffic, it may only take a few 
seconds to transfer a Mb of information from a 
computer in London to a computer in Taiwan. Two 
kinds of ftp access are possible: anonymous ftp and 
full service ftp. Anonymous ftp allows anyone on the 
Internet to access computers which contain public 
file archives. The user simply logs on as anonymous 
and transfers files located in the archives to his or her 
computer. There are thousands of public ftp-sites 
around the world, providing a wealth.of information 

ranging from documents and software for personal 
computers and mainframes to computer graphics, 
and a variety of other data. Full service ftp requires a 
logon name and a password and provides access to 
files outside the public archive section of the host 
computer. . 

The main problem with using ftp is finding out 

where a file is located. Archie is a tool designed to 
facilitate file searches on ftp sites. Without it, finding 


the right file is next to impossible unless you know | 
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on which ftp site it is located. Archie scans all ftp- 
sites periodically and updates its catalogue of files. 
Archie can either look for a particular set of charac- 
ters contained in a file name or produce a list of files 
associated with a certain subject. To find ISO stand- 
ards, for example, type prog iso to produce a list of 
files whose name contains the string ‘iso’. Whatis 
iso, on the other hand, produces a list of subjects with 
which the string ‘iso’ is associated. Although it is 
possible to query Archie using e-mail, full Internet 
access is required to retrieve the actual files. To try 
Archie, telnet to one of the Archie hosts, such as 
archie.doc.ic.ac.uk in the United Kingdom or 
archie.mcgill.ca in Canada. Refer to the appendix for 
an example ofa typical output produced by an Archie 


query. 


Telnet 

Telnet allows Internet users to log into other Internet 
computers to access online databases, electronic li- 
brary catalogues, Internet information services, or to 
access their account.) While ftp is available only for 
users with full access to the Internet, many sites with 
telnet access can also be reached through packet 
switching networks, such as CompuServe or Tymnet. 
Similar to ftp, two levels of access are available 
through telnet. Many sites offer guest logins which 
provide limited, yet useful, capabilities. To gain full 
access to databases and other information services 
available through telnet, a password and a login 
name have to be obtained from the host organization. 
OPACs — Online public access catalogues. Hun- 
dreds of university libraries all over the world have 
made their catalogues available online. The ability to 
search catalogues at a number of institutions from 
your desktop offers a tremendous potential in identi- 
fying resources. Keyword searches may produce 
items not available at your local library. In addition 
to locating resources, OPACs may also be very help- 
ful interminology work. The dictionary offersvenous 
thrombosis and thrombosis of the veins for the Ger- 
man Venenthrombose. A computer search produced 
several books with venous thrombosis in the title, but 
none with thrombosis of the veins, thus providing a 
good indication as to which equivalent to use. In 
addition, remote catalogues may offer detailed cata- 
loguing information for certain types of publications, 
such as government reports or periodicals. Many 
OPACS also offer access to specialized databases, 
ranging from manuscript and statistical collections 
to full text documents, such as newspapers, periodi- 
cals, or literary classics.4 Using these capabilities, 
translation projects can be researched in a fraction of 
the normal time. Several tools are available to find 
out which libraries provide their catalogues online. 
One of them is Hytelnet, a hypertext guide to online 
libraries and other Internet sources. Hytelnet resides 
in memory and waits to be activated with a botkey. In 
addition, Hytelnet also provides information on 
FreeNets, Campus Wide Information Networks 
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(CWIS) and other telnet information sites. Please 
referto the appendix fora list of ftp sites for Hytelnet. 
The other tool is a document compiled by Art St. 
George and Ron Larsen, entitled Internet accessible 
library catalogs & databases. Please refer to the 
appendix to see where it is available. 

Text archives. There are various projects around the 
world which collect language corpora for research 
and other purposes. Some of these corpora are made 
available to the general public. Corpora are very 
useful to the lexicographer and terminologist look- 
ing for usage examples. In many cases, the corpora 
are distributed with specialized access or concord- 
ance software. At Georgetown University, a 
Catalogue of Projects in Electronic Texts was estab- 


lished which can be searched online. It provides 


detailed information on the various electronic text 
projects, such as Project Gutenberg at the University 
of Illinois, ARTFL at the University of Chicago, or 
the Oxford Text Archives. The database is available 
via telnet or modem. Please refer to the appendix for 
a contact address. 

Specialized databases. 1 would like to mention the 
databases provided by the European Community as 
an example of specialized databases accessible 
through the Internet. ECHO, the European Commu- 
nity Host Organization, offers a wide range of unique 
information services online. They are of special in- 
terest to the translator because of their multilingual 
nature. The databases range fróm information on 
research projects, EC documents, to a multilingual 
terminology database. The latter, EURODICATUM, 
is of special interest to translators and terminologists. 
It contains technical and scientific terms.as well as 
contextual phrases. It allows searches limited cer- 
tain fields as well as free text searches. Many of the 
terms contained in EURODICATUM are not yet 
available in printed form, increasing the value of the 
database. To use ECHO databases, telnet into 
ECHO.LU. When prompted for a user code, simply 
type GUEST. Users of the national VIDEOTEX 
services, such as the.German Btx, the French Teletel, 
or the British Prestel Service, also have immediate 
access to ECHO. A contact address for ECHO cus- 
tomer service is provided in the appendix. 

The databases listed above are only a small sam- 
ple. The databases available through the Internet 
would fill a book. The next section will discuss some 
of the online tools which help the user navigate 
through the Internet by providing a more uniform 
and user-friendly interface. 


Navigating through the Internet 

WAIS — Wide area information servers | 
One of the problems with using diverse databases is 
that the user is confronted with a new and unfamiliar 
interface for each database. WAIS addresses this 
problem by providing a uniform, user-friendly inter- 
face on the user’s desktop computer which translates 
the query into a language the database can under- 
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stand. Using WAIS, it is possible to read newspa- 
pers, get information on any country in the world, 
scan many specialized databases, and obtain infor- 
mation about the Internet. WAIS is particularly suited 
for retrieving textual information. It can search 
through many large documents for the occurrence of 
keywords and provide a scored list of the documents 
which are most relevant to the query. Using a tech- 
nique called ‘relevance feedback’ the user can 


customize the scoring process to make WAIS even : 


more effective. A feature of particular interest to 
translators is the capability of WAIS to collect all 
items of interest automatically on a daily, weekly, or 
monthly basis. Thus, if you are interested in the use 
of a certain financial term, you can retrieve all docu- 
ments in which it occurs in the Wall Street Journal 
and instruct WAIS to send you an updated list with 
documents containing the term on a regular basis. То 
try WAIS, telnet to quake.think.com and log in as 
WAIS. 


Gopher 
Gopher servers provide a menu driven interface to 
allow users to browse through the network without 
knowing telnet addresses and ftp sites. If the user 
finds a menu that looks interesting, Gopher will get 
the information associated with it if you press a 
mouse button or the ENTER key. Gopher provides 
easy access to Archie servers, online library guides, 
ftp sites and numerous other Internet information 
sources. The number of Gopher systems is increas- 
ingly rapidly, and the software is becoming more 
sophisticated. All Gopher sites connect to one an- 
other, allowing the user to explore virtually any part 
of the Internet by following a simple menu system. 
Gophers do not provide anything that is not available 
via telnet ог ftp? They just allow the inexperienced 
user to gain access to information without having to 
learn telnet and ftp commands and browse through 
numerous guides to find out where the information is 
located. To experiment with a Gopher server, telnet 
into gopher.cic.net or any of the other Gopher sites. 
One of the drawbacks of Gopher is that the user 
has to negotiate several layers of menus before locat- 
ing the right information. VERONICA, which stands 
for ‘Very Easy Roden-Oriented Netwide Index to 
Computerized Archives' is an attempt to streamline 
Gopher searches by permitting the user to search 
Gopher menus by keywords to find relevant menus 
more quickly. The number of VERONICA systems 
is still small. Thus, the ones that do exist are used 
heavily, resulting in slow response times. The popu- 
larity of VERONICA is bound to increase as more 
people learn about it, and within a year, most major 
Internet sites will probably have VERONICA in- 
stalled.’ 


World Wide Web 


World Wide Web, short WWW, is similar to Gopher 
and WAIS. However, instead ofa menu driven design, 
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it uses hypertext. WWW is not as widely available as 
Gopher, nor is it as fully developed. Since hypertext 
is a very powerful organizational tool, WWW holds 
great promise. To experiment with WWW, telnet 
into info.cern.ch located in Zurich, Switzerland. 


Outlook on the future 
The Internet is growing at a phenomenal pace. The 
inter-connectivity it provides between different com- 
puter platforms has attracted the attention of a large 
number of commercial users. Several years ago, 
most of the users of the Internet were part of an elite 
who used high-powered computers to transfer mas- 
sive amounts of research data across the continent. 
Today, a growing number of users is interested in the 
Internet for commercial reasons, pushing the re- 
search elite into the background. Thus the Internet is 
going through a transition from a research and edu- 
cational to an increasingly commercial network. 

One of the reasons that commercial applications 
have not appeared sooner is that the Internet was 
started for research purposes with government seed 
money. No provisions were built in for billing users. 
It was of no concern at the time. As more and more 
people realized the commercial potential of the 
Internet, companies have begun to think seriously 
about finding ways to charge users for online serv- 
ices they would like to offer. Cable companies, such 
as Continental Cable or the Hybrid Corporation in 
San Francisco, are beginning to offer high-speed 
access to the Internet through cable-TV lines, ena- 
bling everybody with cable access to participate in 
this global network. ‘Since most peoplé do not care to 
learn UNIX, the operating system used on most 
Internet sites, user-friendly interfaces have been de- 
veloped which allow the users to hop across continents 
with a few mouse clicks. These interfaces make it 
possible to access text, graphics, sound, and full- 
motion video, turning the Internet into a true 
multi-media environment. These capabilities hold 
tremendous potential for language professionals. 
When doing a translation on arthroscopy, for exam- 
ple, a click on a camera icon will show an animated 
sequence of the insertion of the cannula into the 
knee. Click on any word in the text of the article, and 
the program will instantly look it up in its online 
dictionary. The technology to do it is here today. 
What is needed now is a concerted international 
effort on the part of the profession to co-operate in 
the compilation and dissemination of terminological 
and other data. Some of these efforts, such as the 
Text Encoding Initiative, are alive and well. But 
more are needed. Online research techniques should 
be incorporated in translator training, as some insti- 
tutions have begun to do. Gopher servers should be 
set up to distribute materials for case studies for 
translator training, terminology work, and other ar- 
eas of interest to the profession. 

It is important that more translators get actively ` 
involved in the Internet for several reasons. The 
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commercialization of the Internet, while providing 
some advantages, also involves many risks. Private 
information providers may attempt to restrict infor- 
mation access to corporate users who can afford to 
pay several hundred dollars an hour for online time. 
One of the great things about the Internet is its 
flexibility, and if enough translators get involved, 
either personally or through professional groups, we 
may be able to influence its development and keep it 
a place where information is available to everybody. 

In addition, if more translators get involved, the 
amount of useful information available to the profes- 
sion through the Internet will increase. Although the 
Internet is very helpful, it has yet to live up to its 
potential forthe language professions. Thus we should 
not wait and see what the Internet can do for us, but 
think what we can do for it. If only half of the 
terminological papers written at various translator 
training institutes throughout Europe were put online, 
it would be a big step forward. I hope that this paper 
has aroused enough of your curiosity to venture out 
into the Internet and explore. Only through our ac- 
tive participation will the Internet become a true 
treasure trove. 
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APPENDIX 


The appendix provides a brief list of public access , 2 
points to the Internet, sources of information, and 
tools. р 


Commercial Internet providers and other 
contact addresses: 

AlterNet USA and International 
Tel: +(703) 876 5050 
info@uunet.uu.net 


ANS ‚ USA and International 
©- Joel Maloff 
Tel: +(313) 663 7610 
maloff@nis.ans.net 


PSInet PSI, Inc. _— 

Tel: +(703) 620 6651 

Offers Global Dialup Service for USA 
and international access. It costs 
$39.00 a month and is available 
through the local dialup in about 160 
cities throughout the USA. You can 
reach PSI through X.25 PAD access 
at 3110607400136602 or through 
TELENET/SPRINTNET by typing ‘c 
psinet, ACCOUNTNAME’. On the 
Internet, send mail to info@psi.net 
for more information. 


Georgetown Paul U. Mangiafico, Project Assistant 
Catalogue of The Center for Text and Technology 
Projects in Academic Computer Center 
Electronic 238 Reiss Science Building 
Text Georgetown University 

Washington, DC 20057 

Tel: +(202) 687 6096; 

Fax: +(202) 687 6003 

Internet: 

pmangiafico@guvax.georgetown.edu 


ECHO ECHO Customer Service 
BP 2373 
L-1023 Luxembourg 
Tel: +(352) 34 98 11; 
Fax: +(352) 37 98 1234. 


Addressing protocols 


The addressing protocols for the major networks are 
listed below. A good online source of information 
about gateways is The inter-network mail guide, by 
John Chew. It is available via ftp from ra.msstate.edu; 
directory: pub/docs; file: internetwork-mail-guide. 
Network: Applelink 

To Internet: user@domain@DASNET# 


From Internet: uset@applelink.apple.com 
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Network: CompuServe 2 

To Internet: Internet:user@host.domain 

From Internet: user@compuserve.com 
Note, the comma in the user-id in 
CompuServe changes to a period 
when sending mail from the 
Internet. xxxxx,yyy - > 
ххххх.ууу@сотриѕегуе.сот 


Network: FidoNet 

To Internet: send to UUCP at nearest 
gateway site and set first line of 
message to “To: user@domain’ 

From Internet: uset@Note.Net.Zone.fidonet.org 


Archie listing of Hytelnet ftp sites 

Host animal-farm.nevada.edu 

Location: /pub/ibm.pc 

FILE -rw-r--r-- 613480 Jun 24 09:37 hyteln65.zip 
Location: /pub/mac 

FILE -rw-r--r-- 600064 Feb9 00:00 Hytelnet6.4 .sea.bin 
FILE -rw-r--r-- 733154 Jun 24 09:42 Hytelnet6.5.sit.hqx 
FILE -rw-r--r--  1820Ju129 1992 hytelnet.readme 


Host askhp.ask.uni-karlsruhe.de 
Location: /pub/infosystems 
DIRECTORY drwxr-xr-x 1024 Aug 25 17:37 hytelnet 


Location: /pub/infosystems/hytelnet/amiga 

FILE -rwxr--r-- 84309 Jul 14 1992 Ami-HyTelnet.lha 
FILE -rwxr--t-- 275 Jul 14 1992 Ami-HyTelnet readme 
Location: /pub/infosystems/hytelnet/mac 

FILE -rwxr--r-- 733154 Jun 2405:21 HyTelnet6.5.sit.hqx 
Location: /pub/infosystems/hytelnet/pc 

FILE -rwxr--r-- 613480 Jun 21 04:45 hytein65.zip 
Location: /pub/infosystems/hytelnet/unix 

FILE -rwxr--r-- 21125 Jun 2107:14 hytelnet.tar.Z 
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Other information sources and books 
on the Internet . 


Library catalogues: 

Art St. George and Ron Larson, Internet accessible 
library catalogs & databases available via ftp from: 
nic.cerf.net; directory: cerfnet/cerfnet_info; file: 
Internet-catalogs-XX-XX.txt where XX-XX stands 
for the month and year of the most recent version. 


Internet resources: · 

Judi Harris and the students of TEB 8000. Internet 
Resources Directory, Part 3: File Archives (FTP 
Sites) of Interest to Educators. Available via ftp from 
ftp.virginia.edu (128.143.2.7); directory: /pub/IRD/; 
file: IRD-FTP-sites.txt. 


Judi Harris and the students of TEB 8000. Internet 
Resources Directory, Part 4: Ideas for Curricular 
Infusion of Telecomputing Tools and Resources. 
(July 1992) 


J. Paul Holbrook and Christine S. Pruess, editors. 
CICNet Resource Guide. (June 1992). Available via 
ftp from nic.cic.net (192.131.22.2); directory: pub; 
file: resourceguide. 


Introductory books 
KEHOE, B. P.Zen and the art of the Internet. Prentice 
Hall, 1993. 112p. $22.50. ISBN 0-13-010778-6.: 


KROLL, E. The whole Internet user's guide & cata- 


logue. Sebastopol, CA: O'Reilly and Associates, 
1992. 376pp, $24.95. ISBN 1-56592-02502. 


Advanced books 

FREY, D. & ADAMS, В. /26(8):: A directory of elec- 
tronic mail addressing and networks, Sebastopol, CA: 
O'Reilly and Associates, 1991. ISBN 0-937175-15-3. 
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Abstract 


Paper discussing the promotion of library services in a specialist library in a voluntary organization, with 
background to the history of the information service and an outline of existing current awareness services, and 
posing the question of how, even with on-going promotional activities and methods to encourage user participa- 
tion in the running of their library, to encourage end user independence in such an information environment with 


limited user potential. 


Christian Aid is a development agency working in 
over 70 countries in the Third World in conjunction 
with the churches in the United Kingdom and Ire- 
land, funding long-term development projects in 
addition to providing emergency relief at times of 
disasters, such as famines, floods and earthquakes, in 
three regions, Asia/Pacific, Latin America/Carib- 
bean, Africa/Middle East. A staff of 180 people are 
based at offices in London with 100 staff based 
around the UK and Ireland, raising awareness in 
local churches and communities. 

The library began as a press cuttings library in the 
1970s. It now provides an extensive current aware- 
ness service to its own staff on subjects as diverse as 


liberation theology, women, Aids, country informa- 


tion and statistics. The library consists of over 10,000 
documents and 500 periodicals from New Interna- 
tionalist to The Tablet. We also work closely with 
external agencies such as Cafod, and the World 
Development Movement, and encourage use by any- 
one with an interest in the Third World and 
development issues, from students to journalists and 
film makers. There is a staff of three: a librarian and 
two library assistants. 

The provision of a current awareness service 
consists of a daily press report, offering a summary 
of selected articles from the quality UK and Church 
press which is widely distributed within the 
organization and externally by subscription and 
electronic-mail to Aprodev (Association of Protes- 
tant Development Agencies) in Brussels where it is 
made available to European development agencies. 
A bi-monthly periodicals bulletin consists of peri- 
odicals abstracts from journals received by the 
library, and a weekly publications list provides a 
list of all items received in the Library in the previ- 
ous week. ` 

Electronic information is becoming an increas- 
ingly important area of the library's work, to keep 
staff informed on areas of interest. All library staff 
are responsible for downloading information via 
Geonet and GreenNet for each of the regional groups 


on a weekly basis, for up-to-date information on the 


Philippines, Latin America, South Africa and Struc- 
tural Adjustment Programmes. We also access 
information on international conferences such as last 
year's Earth Summit, and more recently the Human 
Rights Conference in Vienna. 

We are constantly looking at ways of improving 
the service and raising the library's profile. Our 
proactive service is well received but we would 
like more end user independence and more physi- 
cal use of the library for reading and research. 
Reports are frequently written on promoting public 
and academic library services, for example articles 
by Lee! on marketing academic libraries and 
Кіппе!Р on promoting public library services, but 
there is less written on promoting specialist librar- 
ies like our own with limited user potential. 
However, one recent article by McCarthy? on pro- 
moting an in-house library focused on four means 
of promotion: the information professional, the in- 
formation service, packaging of products, and 
promotional activities. 

At Christian Aid we have several methods of 
promotion to market the library to people's specific 
needs, including library guides, for new staff, Area 
staff and visitors to the Library, the bi-monthly Peri- 
odicals Bulletin, printed bibliographies on subjects 
of interest and Christian Aid campaigns, e.g. trade, 
debt, population and environment. To encourage end 
user independence and also to raise the profile of the 
library and library staff, library training courses were 
introduced this year. 

The overseas groups at Christian Aid encourage 
staff from other sections such as Communications, 
Fundraising, Aid and Education to become associate 
members of their groups. This involves attending 
quarterly group meetings for discussions on a topical 
issue, such as election monitoring in Kenya or neo- 
liberalism in Latin America. The Library staff are 
members ofthe thrce overseas groups, which provides 
a further opportunity to raise the library's profile and 
allows for an exchange of ideas and information and 
the means for library staff to keep in touch with areas 
of interest in the rest of the organization. 
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Due to a change of emphasis, the Library ceased to 
be part of the general Christian Aid induction course 
for new staff this year. However, we have begun our 
own training courses, since purchasing new computer 
software, Diderot by Polyphot, 12 months ago, which 
has greatly enhanced the effectiveness of the library 
and opened up staff time for other things. User friendly 
interfaces and improved search speeds have reduced 
staff time spent on routine library procedures and 
facilitated the introduction of user training, and a 
stock edit project with the help of volunteers. 

The library training.is informal and designed to 
raise awareness and the profile of the library and to 
encourage users to use the library and its resources 
independently. Lasting an hour and a half, it 13 avail- 
able to everyone at Christian Aid, and all library staff 
participate with a half hour slot, which gives more 
time for discussing specific services and reference 
materials and answering questions, and the smaller, 
informal groups promote discussion and more effec- 
tive communication. 

Training is in three parts, consisting of a general 
introduction to the library, library services, and the 
reference section; use of the computerized catalogue 
for searching for books, documents and periodical 
abstracts (here people can get hands on experience); 
and a demonstration of how we access electronic 
mail bulletin boards via GreenNet and Geonet. Train- 
ees come in groups of up to four people and we try to 
encourage people from the same sector to come 
together, in order to be able to provide more focused 
training. Great interest has been generated in the 
courses from all sections of the organization, from 
new staff, regular library users and others usually 
prevented from using the library by the demands of 
their work. Twelve months on, the training is now 
geared more to new staff as most of the more experi- 

enced interested staff have attended. 

° А course evaluation form enables people to tell 
us about the design and content of the course, what if 
anything is being omitted and what improvements 
can be made; all valuable criticism for library staff. 
The most often-voiced criticisms are concern about 
lack of time to use the library due to heavy work 
schedules, the length of our training courses, and the 
need for balance and focus when trainees ccnsist of 
staff of different levels of experience. Most people 
say they would like more experience of using elec- 
tronic mail for themselves, which has implications 
for the library, library staff and Christian Aid as a 
whole, as a modem is needed of which there are only 
two at Christian Aid, one in the library and one in the 
computer department and there would be additional 
training requirements. | 

Apart from limited use of the computer system, 
library use has changed little, in spite of the interest 


generated in the training sessions, Statistics show an. 


increase in use of general current awareness services, 
but there is still little independent physical use of the 
library. 
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This state of affairs can be put down to our 
location, on the fourth floor, at the top of the build- 
ing. Use by fourth floor staff is much more pronounced 
than that by staff on other floors. Lack of space in the 
library inhibits use, as the small reading area is 
suitable for a maximum of only five people, and the 
stock is packed tightly on shelves, which does not 
facilitate browsing. Though our training courses en- 
courage users to help themselves and use the computer, 
the reality is that with only two terminals in the library 
which are in constant use by library staff or volun- 
teers, this is hard to achieve. Another factor is priority 
given by staff to doing research in the library, outside 
of their already demanding work schedules. 

Evaluation of library services is essential for the 
running of a library, to ensure services remain fo- 
cused and required. Following on from the idea of 
allowing users to inform library staff of how the 
library can improve on existing services and of user 
requirements and the criteria needed for them to give 
reading and research a higher priority in their work- 
ing day, a library questionnaire was designed and 
distributed to a random selection of London and 
Area staff. The results of the questionnaire have been 
positive in terms of people’s satisfaction with exist- 
ing library services. Staff suggestions offered for 
ways of improving the library and increasing their 
personal usage include ideas such as book/author 
displays, a library plan, refreshments (!), lists of 
useful reference material, regular subject updates, 
and a film and fiction section including work from 
developing countries. 

Building on staff ideas from the questionnaire, 
we have designed a library plan and introduced an 
evaluation section into the book classification se- 
quence in response to user demand. We are 
considering mounting regular book and author dis- 
plays. A newsletter is being considered to promote · 
existing and new library services. A press report 
database and a scanner for reading newspaper arti- 
cles and transposing them on to a VDU screen will 
facilitate searching and the location of press cuttings. 
Computer networking throughout Christian Aid could 
produce more end user independence, giving users 
the facility to browse and search the library cata- . 
logue for books, documents and press cuttings, from 
the comfort of their own desks or homes. 

Follow up training sessions may be useful, with 
more practical training in using the library for.end 
users to put what they learned from the first session 
into practice, to increase confidence and promote 
independence. 

At Christian Aid Library we believe we are 
pursuing an appropriate methodology to promote 
our services, but we have been less successful in 
promoting end user independence. While users are 
enthusiastic about the work we do in disseminating 
information and the exciting developments in 


information provision, such as the use of electronic 


information, the reality is much different. How do 
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information professionals in specialist libraries 
persuade staff already overburdened with work, to 
give personal reading and research a higher priority? 
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Abstract 

This paper puts into perspective the importance of the European project Train-ISS, taking into account the 
experience gained by one of the contractors INETI/CITI (Instituto Nacional de Investigagdo e Tecnologia 
Industrial/Centro de Informagáo Técnica para a Indistria) in developing the information market in Portugal. The 
most important landmarks of the development of the information market in Portugal are outlined. One of the 
factors restricting this development has been the shortage of trained information professionals. Two projects 
developed by INETI/CITI, in collaboration with the University of Sheffield, Department of Information Studies, to 
counteract this shortage are outlined. These are the Postgraduate Course for Information Intermediaries, an 
intensive six-month training course which ran for three consecutive years, 1987-1990, and the MSc in Information 
Management from the University of Sheffield delivered at INETI. The paper concludes by addressing the skills of 
the professional information Specialists trained by INETI and potentially by the Train-ISS advanced training 
programme in Information Management. It indicates the main parts of the contents of the programme leading to 
a Diploma in Electronic Information Management. Those who wish to continue and submit a dissertation within 


one year of obtaining the diploma would be awarded the MSc in Electronic Information Management. 


1. Introduction 

This paper takes into account the experience gained 
by INETI/CITI in developing the information mar- 
ket in Portugal. This includes our participation in 
different roles for several national initiatives and our 
experience with Sheffield University through the 
implementation of two advanced training pro- 
grammes in Information Management. 


2. Brief overview of the development of the 
information industry in Portugal 

The development of the information industry in 
Portugal can best be illustrated by looking at some 
of its most important landmarks: 


а) Comissáo Coordenadora de Investigação da 
Indústria de Informação (CCI-Indüstrias de 
Informação) 

Coordinating Commission for Research in the 
Area of Information Industry - 

In 1986, the information industry was identified as 
one of. the priority areas to be addressed and the 
Comissão Coordenadora de Investigação da Indústrias 
de Informação was created. Prior to starting CCI’s 
activity, a report was commissioned from a working 
group. Several topics were addressed including the 
need for training of information specialists and for 
initiating R&D in information science. 


b)The Programme for the Development of an 
Information System for Industry in Portugal 

This Programme, set up in 1987, envisaged the 

creation of information nodes at Industrial Associa- 


tions (IA’s — the Portuguese equivalent of Chambers 
of Commerce). 
The approach used for the implementation of 
this Programme to which INETI/CICI was appointed 
as the programme’s coordinating organization 
included ^?: | 
i) the creation of a network of information nodes 
at several IAs located in different regions of the 
country; 
il) the staffing of these nodes with information 
intermediaries (I will return to this topic, later). 


с) LFR investigation on participation in pilot/ 
demonstration projects in the information 
services market area 

The studies, carried out by IMO, demonstrated? ^ * 

that the gap between the more advanced countries 

and the less favoured regions was increasing, due to 
the weakness of the information infrastructures and 
the lack of know-how required to use and exploit the 
information services in these regions. The low level 
of participation of organizations from peripheral re- 
gions (Greece, Ireland, Italy, Portugal and Spain) in 
the IMPACT 1 Call for Proposals on pilot/demon- 
stration projects within IMPACT 1 jeopardized the 
community cohesion in this market area. Taking into 
account this reality CCE commissioned a study (fol- 
lowing five parallel studies carried out in the LFRs) 
to identify measures to be taken to strengthen the 
information market in those regions®. 

The report® pointed out the existence of several 
factors which hindered the development of the infor- 
mation services market in LFRs and concluded that 
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the information market in those regions is still in an 
immature state of development. 


d)PITIE — Programa Integrado de Tecnologias 
de Informação e Electrónica 


— Integrated Plan of Information Technology and 
Electronics 


Taking into account the strategic role of information 
technologies and the electronic sector in the mod- 
ernization of Portugal, it was decided, in 1990, to 
introduce a sectoral programme devoted to the de- 
velopment of those sectors. This was named PITIE 
(Programa Integrado das Tecnologias de Informagáo 
e Electrónica). One of the objectives of this 
programme was the creation and the development of 
new companies for the information technology and 
electronics sectors. 


PITIE, which became a national strategic pro- 
gramme, took into account the reality of the 
information Industry sector in Portugal and the 
experience gained by the previous initiatives. 


From January 1990 to June 1992, the number of 
the applications to PITIE financial funds were 242. 
From these, only 85 have been approved to be awarded 
PITIE financial funds. 


The highest number of projects approved (69) 
came from the software area, the second highest 
number of approved projects (19) were from the 
information industry sector °. 


The new Information Projects being developed 
with financial support from PITIE employ technolo- 
gies such as online, magnetic and optical discs, 
videotex, multimedia, audiotex, and cover a variety 
of types of information such as: geographical, cul- 
tural (museum information), legal, financial, 
economic and entertainment. 


Usually, the application of emerging technology 
and consequently the development of ‘know-how’ in 
new technologies weighed heavily in the final deci- 
sion. In other cases, the projects approved used 
established technologies, but the decision to provide 
financial support was based on the fact that the 
information that they will archive was information 
generated in Portugal, which would have been lost if 
an information product had not been developed, and/ 
or that some novelty was achieved in the final prod- 
uct through the convenience of presentation of the 
information. 


Another aspect that deserves mention is that the 


majority of companies whose projects have been 
approved were of recent creation (1989). Certainly 
the availability of the PITIE funds has contributed to 
the consolidation of their activity in the emerging 
sector of information industry. 


Thus, PITIE has been the first tangible govern- 
mental initiative to support the development of this 
new sector, by making funds available. 
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е) Portuguese participation in IMPACT 1 Pilot 
Demonstration Projects and IMPACT 2 Strate- 
gic Information Initiatives 

Only two of the 19 projects selected for IMPACT 1 

had a Portuguese organization and then only as part- 

ners. One of these was Ulysses International, a project 
aimed at making tourism information more readily 
available and the other was HYPP-Hypermedia for 

Plant Protection’. 

IMPACT 2 Multimedia Information (MMI) and 
Geographical Information Services (GIS) Call for 
Proposals have attracted a higher number of Portu- 
guese organizations to submit proposals. Nine 
Portuguese companies have submitted projects to 
MMI as leaders and several other Portuguese compa- 
nies were partners. Of those nine only one has received 
financial support for phase 1 and it has also been 
approved to receive financing for phase 2. Seven 
projects involving Portuguese organizations, only as 
partners, have received IMPACT funding within 
phase 1 of IMPACT 2 GIS initiative. This shows 
that, as with other countries, Portuguese SMEs are 
being attracted to invest in this new sector. 

As recognized by CEC, this is a result of the 
increased awareness of the opportunities in the infor- 
mation market, as a result of the actions developed 
by NFP (National Focal Points) / NAP (National 
Awareness Partners) in the LFR's regions, within the 
framework of IMPACT 2 Actions Line 3. 

These are encouraging signs that Portuguese en- 
trepreneurs are willing to play an active part in this 
Sector. 


3. Developing the market, creating the demand 
The information market is characterized by its broad 
base and wide range of products. Since a large number 
of information products already existed, although 
not in the Portuguese language, it was decided that 
the creation and nurture of the Portuguese informa- 
tion market could best be achieved through 
stimulating demand. CITI has employed two meth- 
ods: the first and most conventional way was making 
presentations at seminars or conferences, to arrange 
demonstrations and using the media coverage; the: 
second and less obvious way was to train informa- 
tion intermediaries and place them. strategically in 
the business and industry sectors to satisfy and stimu- 
late demand for information products. This explains 
why CITI, when charged with the responsibility to 
coordinate the Programme for the Development of 
an Information System for Industry (referred to 
above), placed such strong emphasis on the training 
of information intermediaries. 


4. The Postgraduate Course for Information In- 
termediaries | 

The lack of trained information professionals and the 
means to provide them, were both highlighted by the 
introduction of the Programme for the Development 
of an Information System for Industry in Portugal | 
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(referred to at para. 2b). It was immediately apparent 
that there was a shortfall of trained information 
professionals with the necessary skills to staff the 
nodes created at IAs. 

Consequently, CITI took the opportunity to pro- 
vide training for the information intermediaries, 
through the creation of a Postgraduate Course for 
Information Intermediaries, an intensive six-month 
training course, with the collaboration of the Depart- 
ment of Information Studies of the University of 
Sheffield. This course ran for three consecutive years, 
1987-1990, and sixty information intermediaries have 
been trained. 

It was recognized that the successful implemen- 
tation of the Programme for the Development of an 
Information System for Industry depends largely on 
the existence of information intermediaries with the 
ability to: 

i) seek, identify, select, process and present in- 
formation (existing in any format — paper, 
online, CD-ROM) in a form adapted to the 
specific needs of users in industry; 

ii) contribute towards the identification of exist- 
ing information resources (data collections) on 
paper, or any other format and to computerize 
them (online, CD-ROM, diskette) to make them 
available to users in industry. 

The success of this ‘fire-fighting’ response and 
the ready acceptance by the market of these sixty 
information professionals created the environment 
in which more permanent training arrangements could 
be considered. The development of the Master’s 
degree in Information Management of the University 
of Sheffield, taught at INETI (formerly LNETI) since 
1991, was a natural consequence of the close asso- 
ciation between the two organizations during the 
teaching of the six-month Postgraduate Course for 
Information Intermediaries. 


5. The MSc in Information Management from 
the University of Sheffield in Portugal 
For the past two and half years, the Department of 
Information Studies, University of Sheffield, (USDIS) 
has been collaborating with INETI and its Centre for 
Technical Information for Industry (CITI) in offer- 
ing the MSc in Information Management course with 
„the support of PEDIP (Programa Específico para o 
Desenvolvimento da Industria Portuguesa — Specific 
" Programme for the Development of Portuguese 
Industry), a programme aimed at creating the condi- 
tions to enable Portuguese industry to adapt to the 
new roles and challenges from the EEC open mar- 
ket". 


of an agreement between INETI and the University 
of Sheffield, whereby, with the financial support of 
the PEDIP, the course preparation, teaching, and 
travel/subsistence expenses are paid to the Univer- 
sity of Sheffield; additional local costs (local teachers, 
library costs, online access and telecommunications 
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costs, etc.), are met from the same source. Students 
register for the degree of MSc in Information Man- 
agement at the University of Sheffield, but the course 
is taught in Lisbon on the premises of INETI at 
Lumiar. The course consists of twelve months of 
taught modules followed by a six-month period dur- 
ing which the student undertakes an independent 
research project and dissertation. The version of the 
course that has been delivered within the INETI/ 
USDIS protocol includes.some modifications and 
inputs from local staff, which have enabled the course 
to meet national (Portuguese) requirements. Mem- 
bers of staff of USDIS contribute about 25 man-weeks 
of teaching input of the course at INETI, as well as 
preparing the self-study materials and assisting with 
dissertation supervision. 

Within the first intake, which lasted from April 
1991 to September 1992, 15 students were awarded 
the MSc in Information Management degree. In the 
second intake, which started in June 1992, 24 stu- 
dents are finalizing their dissertations. 

Through the research component with the infor- 
mation management curriculum, the students in the 
MSc programme are designing and developing 
projects which not only address problems experi- 
enced by the Portuguese industrial and business 
sectors, so contributing to the increase in awareness 
to information by SMEs, but also prototypes of new 
information products and services. This may offer 
some guidance to those students of Train-ISS who 
wish to continue on to the Masters level. (As referred 
to in para. 6) 

The research component of the course provides 
the students with the opportunity to examine, inter- 
pret and review related research and then to develop 
an independent research project. This component 
consists of a module on research methods, research 
proposal development and the research leading to a 
dissertation. 

The dissertations produced by the students repre- 
sent a wide range of interests, from those associated 
with the development of more effective information 
systems in business and industry to those that deal 
with significant aspects of national information 
policy. 

Examples of the former include: 

i) Sistemas de análise da concorrência — 

' Competitor Intelligence Systems. 

A study ofthe external sources of information 

- available to an organization when developing 
its marketing strategy. The study involved 
interviews with managers of seven large Portu- 
guese companies and analysis of existing 
competitive intelligence systems 

ii) Associação Industrial Portuense: a divisão de 

informação e análise económica — situação e 
perspectivas | 

This was a broad study to survey the informa- 
tion resources and: flows within Associação 
Industrial Portuense (AIPortuense) and sought 
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to match these with perceived user needs. The 
study took small to medium enterprises as its 
user sector, since these constitute some 70% 
of the members of AlPortuense. 
iii) Estudo da implementação de um projecto de 
videotexto 
This was a study to determine the likely im- 
pact of a specific information product, 
videotext, on the same user sector as (iv) 
below. It was the basis for a videotext system 
to be implemented by AIP and will connect its 
Headquarters with all its members. 
iv) Teaching PASCAL — A tutorial system in 
GUIDE 
This study was concerned with the design of a 
hypertext system for the teaching of PASCAL 
programming language. After an introduc- 
tion to hypertext, the author describes in detail 
the structure of the PASCAL lessons, draw- 
ing on his experience of teaching the subject 
in the University of Algarve. 
Analysis and design of a production manage- 
ment system (Shipyard); using Oracle Case 
Tool | 
This study was concerned with the analysis 
and design of.an information system to support 
the planning and control of shipyard produc- 
tion activities. The work was undertaken using 
: Oracle Case structured methodology and 
-techniques. 


у 


> 


Examples of the policy-related work include: 
i). National Information Policy: A study of the 
Portuguese legal framework (1989-1992) 
National legislation, published in ће Diário 
da Repüblica, 1* Série, January 1980 to June 
1992, was reviewed to identify documents 
relating to a national information policy and 
the legislative initiatives taken to formulate 
such a policy. The study includes a chrono- 
logical presentation of all relevant legislative 
. documents found in the review and the devel- 
opment of a database to facilitate further 

searching of the legislation. 

ii) Os problemas éticos dos profissionais de 
informação 
(an approach to the information professional’s 

‘ethical problems) 
The main interest of this work.is a practical 
one: how the Portuguese information profes- 
sionals act in their work places with regard to 
these problems and how a set of practical 
guidelines may help to improve their profes- 
sional performance. 

iii) Crime informático — Implicagóes para o 
desenvolvimento do mercado de informagáo. 
(Computer crime — Implications for the de- 
velopment of Electronic Information Market). 
To identify and characterize the barriers to the 
expansion of electronic information due to 
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computer crime, end to present a classifica- 
tion of the various types of computer crime in 
Portugal. A comparison with the European 
and international scene. The results of this 
research are awaited by the Police College 
and other departments of the Ministry of Jus- 
tice in Portugal. 
iv) A oferta de formzgao profissional no sector 
‘da informação 
(Vocational Training in the Information 
Sector) 
An analysis of vocational training for infor- 
mation professionals which exists at the 
present time with a view to identifying the 
gaps and to recommend improvements. This 
dissertation has already generated consider- 
able interest within the Ministry of Education. 
It is expected that, through the application of this 
strategy, we can contribue to a better understanding 
of the advantages of an efficient information man- 
agement function within Portuguese industries and 
businesses, and also highlight the role of information 
and its management in the economy and society in 
general. - | 


6. Skills of information specialists trained by 
INETI and potentially by the Train-ISS advanced 
training programme in Information Management 
It is still too early to comment on the effect that 
Postgraduation as Information Intermediaries or the 
possession of the MSc in Information Management 
has on career and promotion prospects. However, we 
believe that both courses — the Postgraduate Course 
for Information Intermed'aries and the MSc in Infor- 
mation Management — have contributed to the 
preparation of the much-needed information man- 
agement specialists who will operate in diverse 
organizations in Portugal. It also is possible to char- 
acterize the profile of the information manager trained 
by these programmes and potentially by Train-ISS 
by the roles the students are performing, such as: · 

i) design and development of database systems 
for business applications, taking account of 
the latest developments in hypertext and ех- 
pert systems; 

ii). information support work for senior and mid- 
dle executives through an understanding. of 
their information needs, and of sources of 
business informalion, both published and 
online; Ў 

iii) end-user support for users of corporate com- 

' puter services, assisting in the choice of 
software packages and contributing to user- 
education programmes; 

iv) consultants in the electronic information 
industry, with a particular understanding of 
users’ needs and a thorough grounding in the 
basics of computer-based information systems; 

v) consultants assist in the design and develop- 
ment of online information systems in the 
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information market and user-support staff in 
this industry; 

vi) middle management roles in corporate 
computer services through their wider under- 
standing of management information needs. 

Train-ISS is intended to develop these skills 

through the implementation of the programme set 
out below. 


Semester One 
1. The information market in the LFRs (2 modules) 
2. Database design and development I (1 module) 


Semester Two 

2. Databases design and development П (1 module) 

3. Database operations, marketing and user support 
(2 modules) 


Placement 
Minimum of one month placement with an experi- 
enced information provider. 


The students who complete one written examina- 
tion at the end of Semester One and two written 
examinations at the end of Semester Two, one piece 
of assessed coursework for each of the three main 
course components and a placement report, will be 
awarded a Diploma in Electronic Information Man- 
agement. 

The coursework will be organized around the 
problem of the development of a business plan for a 
commercial database, with each module contribut- 
ing the appropriate technical, marketing and personal 
elements. 

Those who wish to continue and submit a disser- 
tation on an approved topic within one year of 
obtaining the diploma will be awarded the MSc in 
Electronic Information Management. 


7. Conclusions 
It is obvious that we need to educate our young 
people to value and manage our information re- 
sources properly. University courses in all disciplines 
should include the basics of search and retrieval so 
that our graduates expect and demand adequate in- 
formation services throughout their working lives 
thereafter. 

Also the implementation of projects such as those 
described here, the Postgraduate Course for Informa- 
tion Intermediaries, the MSc in Information 
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Management as well as the Train-ISS project Course 
will provide the much needed Information Special- 
ists required for the development of the Information 
market in the LFRs. 
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Abstract 

The paper argues that the IBM statistical approach to machine translation has done rather better after a few years 
than many sceptics believed it could. However, it is neither as novel as its proponents suggest nor is it making 
claims as clear and simple as they would have us believe. The performance of the purely statistical system (and we 
discuss what that phrase could mean) has not equalled the performance of SYSTRAN. More importantly, the 
system is now being shifted to a hybrid that incorporates much of the linguistic information that it was initially 
claimed by IBM would not be needed for MT. Hence, one might infer that its own proponents do not believe ‘pure’ 
statistics sufficient for MT of a usable quality. In addition to real limits on the statistical method, there are also 
strong economic limits imposed by their methodology of data gathering. However, the paper concludes that the 
IBM group have done the field a great service in pushing these methods far further than before, and by reminding 


everyone of the virtues of empiricism in the field and the need for large scale gathering of data. 


History 

Like connectionism, statistically-based machine 
translation is a theory one was brought up to believe 
had been firmly locked away in the attic, but here it 
is back in the living room. Unlike connectionism, it 
carries no psychological baggage, in that it seeks to 
explain nothing and cannot be attacked on grounds 
of its small scale as connectionist work has been. On 
the contrary that is how it attacks the rest of us. 

‘It is well known that Western Languages are 
50% redundant. Experiment shows that if an average 
person guesses the successive words in a completely 
unknown sentence he has to be told only half of 
them. Experiment shows that this also applies to 
guessing the successive word-ideas in a foreign lan- 
guage. How can this fact be used in machine 
translation’.’ 

Alas, that early article told us little by way of an 
answer and contained virtually no experiments or 
empirical work. Like IBM’s approach it was essen- 
tially a continuation of the idea underlying Weaver’s 
original memorandum on MT: that foreign languages 
were a code to be cracked. I display the quotation as 
a curiosity, to show that the idea itself is not new and 
was well known to those who laid the foundations of 
modern representational linguistics and AI. 

I personally never believed Chomsky's* argu- 
ments in 1957 against other theories than his own 
any more than I did what he was for: his attacks on 
statistical and behaviourist methods (as on every 
thing else, like phrase structure grammars) -were 
always in terms of their failure to give explanations, 
and I will make no use of such arguments here, 
noting as I say that how much I resent IBMs use of 


‘linguist’ to describe everyone and anyone they are 
against. There is a great difference between linguis- 
tic theory in Chomsky's sense, as motivated entirely 
by the need to explain, and theories, whether linguis-. 
tic/AI or whatever, as the basis of procedural, 
application-engineering-orientated accounts of lan- 
guage. The latter stress testability, procedures, 
coverage, recovery from error, non-standard lan- 
guage, metaphor, textual context, and the interface to 
general knowledge structures. 

Like many in NLP and AI I was brought up to 
oppose linguistic methods on exactly the grounds 
IBM do: their practitioners were uninterested in per- 
formance and success at MT in particular. Indeed, 
the IBM work to be described here has something in 
common with Chomsky's views, which formed the 
post-1957 definition of ‘linguist’. It is clear from 
Chomsky's description of statistical and Skinnerian 
methods that he was not at all opposed to relevance/ 
pragmatics/semantics-free methods — he advocated 
them in fact — it was only that, for Chomsky, the 
statistical methods advocated at the time were too 
simple a method to do what he wanted to do with 
transformational grammars. More recent develop- 
ments in finite state (as in Phrase Structure) grammars 
have shown that Chomsky was simply wrong about 
the empirical coverage of simple mechanisms. 

In the same vein he dismissed statistical theories 
of language on the ground that sentence pairs like: 

the 
Isawa 
triangular whole 
are equally unlikely but utterly different in that only 
the first is ungrammatical. It will be clear that the 
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IBM approach discussed here is not in the least 
attached by such an observation. 


Is the debate about empiricism? No. 

_ Anyone working in MT, by whatever method, must 
care about success, in so far as that is what defines 
the task. Given that, the published basis of the debate 
between rationalism and empiricism in MT is silly: 
we are all empiricists and, to a similar degree, we are 
all rationalists, in that we prefer certain methodolo- 
gies to others and will lapse back to others only when 
our empiricism forces us to. That applies to both 
sides in this debate, a point I shall return to. 

An important note before continuing: when I 
refer to IBM machine translation I mean only the 
systems referred to at the end by Brownet al. IBM as 
a whole supports many approaches to MT, including 
McCord’s” prolog-based symbolic approach, as well 
as symbolic systems in Germany and Japan. 


Is the debate about how we evaluate MT? No. 

In the same vein, I shall not, as some colleagues on 
my side of the argument would like, jump ship on 
standard evaluation techniques for MT and claim 
that only very special and sensitive techniques (usu- 
ally machine-aided techniques to assist the translator) 
should in future be used to assess our approach. 

MT evaluation is, for all its faults, probably in 
better shape than MT itself, and we should not change 
the referee when we happen not to like how part of 
the game is going. Machine-aided translation (MAT) 
may be fine stuff but IBM’s approach should be 
competed with head on by those who disagree with 
it. In any case, IBM's method could in principle 
provide, just as any other system could, the first draft 
translation for a translator to improve online. The 
only argument against that is that IBM's would be a 
less useful first draft if a user wanted to see why 
certain translation decisions had been taken. It is a 
moot point how important that feature is. However, 
and this is a point Slocum among others has made 
many times, the evaluation of MT must in the end be 
economic not scientific. It is a technology and must 
give added value to a human task. The ALPAC 
report, it is often forgotten, was about the economics 
of contemporary MT, not about its scientific status: 
the report simply said that MT at that time was not 
competitive, quality for quality, with human 
translation. 

SYSTRAN won that argument later by showing 
there was a market for the quality it produced at a 
given cost. We shall return to this point later, but I 
make it now because it is one that does tell, in the 
long run, on the side of those who want to emphasize 
MAT. But for now, and for any coming showdown 
between statistically and non-statistically based MT 
— where the latter will probably have to accept 
SYSTRAN as their champion for the moment, like it 
or not — we might as well accept existing 'quasi- 
scientific' evaluation criteria, Cloze tests, test sets of 
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sentences, improvement and acceptability judged by 
monolingual and bilingual judges etc. None of us in 
this debate and this researzh community are compe- 
ten-to settle the economic battle ofthe future, decisive 
though it may be. 


Arguments not to use against IBM 

There are other well known arguments that should 
not be used against IBM, such as that much natural 
language is mostly metap3orical and that applies to 
MT as much as any other NLP task and statistical 
me:hods cannot handle it. This is a weak but interest- 
ingargument: the awful fact is that IBM cannot even 
consider a category such as metaphorical use. Every- 
thirg comes out in the wash, as it were, and it either 
trarslates or it does and you cannot ask why. Much of 
ест success rate of sentences translated acceptably 
is probably of metaphorical uses, There may be some 
residual use for this argument concerned with very 
low frequency types of deviance, as there is for very 
low frequency words themselves, but no one has yet 
stated this clearly or shown how their symbolic 
theory in fact gets such uses right (though many of us 
have theories of that). IBM resolutely deny the need 
of eny such special theory, for scale is all that counts 
for them. 


Wtat is the state-of-play right now? 

Away with rumour and speculation; what is the true 
stae of play at the moment? In recent reported but 
unpublished DARPA-supervised tests the IBM sys- 
ter: CANDIDE did well, but significantly worse 
than SYSTRAN's French-English system over texts 
on which neither IBM nor SYSTRAN had trained. 
Moreover, CANDIDE had far higher standard devia- 
tiors than SYSTRAN, which is to say that SYSTRAN 
was far more consistent in its quality (just as the 
control human translators had the lowest standard 
deviations across differing texts). French-English is 
not one of SYSTRAN's best systems but this is still 
a siznificant result. It may be unpleasant for those in 
the symbolic camp, who are sure their own system 
could, or should, do better than SYSTRAN, to have 
to cling to it in this competition as the flagship of 
symbolic MT, but there it is. IBM have taken about 
four years to get to this point. French-English 
SYSTRAN was getting to about IBM's current lev- 
els after three to four years of work. IBM would reply 
that that they are an MT system factory, and could do 
the next language much faster. We shall return to this 
pont. 


Whatis the distinctive claim by IBM about how to 
do MT? 

We need to establish a ground zero on what the IBM 
зузтет is: their rhetorical claim is (or perhaps was) 
that they are a pure statistical system, different from 
the:r competitors, glorying in the fact that they did 
not even need French speakers. By analogy with 
Searle's Chinese Room, one could call theirs a French 
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Room position: MT without a glimmiering of under- 
standing or even knowing that French was the 
language they were working on! There is no space 
here for a detailed description of IBM’s claims (see 
Brown et al 23). In essence, the method is an adapta- 
tion of one that worked well for speech decoding.* 

The method establishes three components: (a) a 
trigram model of English sequences; (b) the same for 
French; (c) a model of quantitative correspondence 
of the parts of aligned sentences between French and 
English. The first two are established from very large 
monolingual corpora in the two languages, of the 
order of 100 million words, the third from a corpus of 
aligned sentences in a parallel French-English cor- 
pus that are translations of each other. All three were 
provided by a large machine-readable subset of the 
French-English parallel corpus of Canadian parlia- 
mentary proceedings (Hansara). (1) and (2) are 
valuable independent of the language pair and could 
be used in other pairings which is why they now call 
the in model atransfer one. In very rough simplifica- 
tion: an English sentence yields likeliest equivalences 
for word strings (sub strings of the English input 
sentence) i.e. French word strings, The trigram model 
for French re-arranges these into the most likely 
order, which is the output French sentence. One of 
their most striking demonstrations is that their trigram 
model for French (or English) reliably produces (as 
the likeliest order for the components) the correct 
ordering of items for a sentence of ten words or less. 

What should be emphasized is the enormous 
amount of pre-computation that this method requires 
and, even then, a ten word sentence as input requires 
an additional hour of computation to produce a trans- 
lation. This figure will undoubtedly reduce with time 
and hardware expansion but it gives some idea of the 
computational intensity of IBM’s method. 

The facts are now quite different. They have 
taken in whatever linguistic has helped: morphology 
tables, sense tagging (which is directional and de- 
pendent of the properties of French in particular), a 
transfer architecture with an intermediate represen- 
tation, plural listings, and an actual or proposed use 
of bilingual dictionaries. In one sense, the symbolic 
case has won: they topped out by pure statistics at 
around 40% of sentences acceptably translated and 
then added whatever was necessary from a symbolic 
approach to upgrade the figures. No one can blame 
them: it is simply that they have no firm position 
beyond taking whatever will succeed, and who can 
object to that? 

There is then no theoretical debate at all, and their 
thetorical points against symbolic MT are in bad faith. 
It is Stone Soup: the statistics are in the bottom of the 
pot but all flavour and progress now come from the 
odd trimmings of our systems they pop into the pot. 

They are, as it were, wholly pragmatic statisti- 
cians: less pure than, say, the Gale group at AT&T: 
this is easily seen by the IBM introduction of notions 
like the one they call ‘informants’ where a noun phrase 
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of some sort is sought before a particular text item of 
interest. This is an interpolation of a highly theoreti- 
cally-loaded notion into a routine that, until then, had 
treated all text items as mere uninterpreted symbols. 

One could make an analogy here with localist 
versus distributivist sub-symbolic connectionists: the 
former, but not the latter, will take on all kinds of 
categories and representations developed by others 
for their purposes, without feeling any strong need to 
discuss their status as.artifacts, i.e. how they could 
have been constructed other than by handcrafting. 

This also makes it hard to argue with them. So, 
also, does their unacademic habit of telling you what 
they've done but not publishing it, allegedly because 
they are (a) advancing so fast, and (b) have suffered 
ripoffs. One can sympathize with all this but it makes 
serious debate very hard, 


The only issue 
There is only one real issue: is there any natural 
ceiling of success to pure statistical methods? The 
shift in their position suggests there is. One might 
expect some success with those methods on several 
grounds (and therefore not be as surprised as many 
are at their success): 

a) there have been substantial technical advances in 
statistical methods since King’s day and, of course, 
in fast hardware to execute such functions, and in 
disk size to store the corpora; 

b) The redundancy levels of natural languages like 
English are around 5096 over both words and 
letters. One might expect well-optimized statisti- 
cal functions to exploit that to about that limit, 
with translation as much as any other NLP task. 
One could turn this round in a question to the IBM 
group: how do they explain why they get, say, 40- 
50% or so of sentences right, rather than 100%? If 
their answer refers to the well-known redundancy 
figure above, then the ceiling comes into view 
immediately. If, on the other hand, their answer is 
that they cannot explain anything, or there is no 
explaining to do or discussions to have, then their 
task and methodology are very odd indeed. Debate 
and explanation are made impossible and, where 
that is so, one is normally outside any rational or 
scientific realm. It is the world of the witch-doc- 
tor: look I do what I do and notice that (sometimes) 
it works; 

c) according to a conjecture I propounded some years 
ago, with much anecdotal support, any theory 
whatever no matter how bizarre will do some MT. 
Hence my surprise level is always low. 


Other reasons for expecting a ceiling to success 

with statistics 

Other considerations that suggest there is a ceiling to 

pure statistical methods are as follows: . | 

i) a parallel with statistical information retrieval 
may be suggestive here: it generally works below 
the 80% threshold, and the. precision/recall 
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tradeoff seems a barrier to greater success by 
those methods. Yet it is, by general agreement, an 
easier task than MT and has been systematically 
worked on for over 35 years, unlike statistical 
MT whose career has been intermittent. The rela- 
tionship of MT to IR is rather like that of sentence 
parsers to sentence recognizers. A key point to 
note is how rapid the early successes of IR were, 
and how slow the optimization of those tech- 
niques has been since then! 

ii) a technical issue here is the degree of their reli- 
ance on alignment algorithms as a pre-process: in 
АСТ.91 they claimed only 80% correct align- 
ments, in which case how could they exceed the 
ceiling that that suggests? 

iii) Their model of a single language is a trigram 
model because moving up to even one item longer 
(i.e. a quadgram model) would be computationally 
prohibitive. This alone must impose a strong 
constraint on how well they can do in the end, 
since any language has phenomena that connect 
outside the three item window. This is agreed by 
all parties. The only issue is how far one can get 
with the simple trigram model (and, as we have 
seen, it gives a basic 40%), and how far can 
distance phenomena in syntax be finessed by 
forms of information caching. One can see the 
effort to extend the window as enormously in- 
genious, or patching up what is a basically 
inadequate model when taken alone. 


The future: hybrid approaches 

Given the early success of IBM’s methods, the most 
serious and positive question should be what kinds 
of hybrid approach will do best in the future: coming 
from the symbolic end, plus statistics, or from a 
statistical base but inducing, or just taking over, 
whatever symbolic structures help? For this we can 
only watch and wait, and possibly help a little here 
and there. However, there are still some subsidiary 
considerations. 


IBM, SYSTRAN and the economics of corpora 
In one sense, what IBM have done is partially 
automate the SYSTRAN construction process: re- 
placing laborious error feedback with statistical 
surveys and lexicon construction. And all of us, 
including SYSTRAN itself, could do the same. 
However, we must always remember how totally 
tied IBM are to their Hansard text, the Rosetta 
Stone, one might say, of modern MT. We should 
remember, too, that their notion of word sense is 
only and exactly that of correspondences between 
different languages, a wholly unintuitive one for 
many people. 

The problem IBM have is that few such vast 
bilingual corpora are available in languages for which 
MT is needed. If, however, they had to be con- 
structed by hand, then the economics of what IBM 
have done would change radically. By bad luck, the 
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languages for which such corpora are available are 
also languages in which SYSTRAN already has done 
pretty well, so IBM will have to overtake, then widen 


. the gap with, SYSTRAN’s performance a bit before 


they can be taken seriously from an economic point 
of view. They may be clever enough to make do with 
less than the current 100 million word corpora per 
language, but one would naturally expect quality to 
dezline as they did so. 

This resource argument could be very important: 
Leech has always made the point, with his own 
stztistical tagger, that any move to make higher-level 
structures available to the tagger always ended up 
requiring much more tex: than he had expected. 

This observation does notaccord withIBM’ sclaims, 
wtich are rather the reverse, so an important point to 
wetch in future will be whether IBM will be able to 
ob:ain adequate bilingual-corpora for the domain- 
specialized MT that is most in demand (such as airline 
reservations or bank billings). Hansard has the advan- 
{асе of being large but is very very general indeed. 


Why the AI argument about MT still has force 
The basic AI argument for knowledge-based process- 
ing does not admit defeat and retreat, it just regroups. 
It aas to accept Bar Hillel’s old anti-MT argument! 
оп 1$ own side — i.e. that es he said, good MT must in 
the end need knowledge representations. One ver- 
sicn of this argument is the primitive psychological 
опг; humans do not do -ranslation by exposure to 
such vast texts, because they simply have not had 
such exposure, and in the end how people do things 
will prove important. Note that this argument makes 
an empirical claim about human exposure to text that 
mizht be hard to substantiate. This argument will cut 
litte ice with our opponeats, but there may still be a 
2054 argument that we do need representations for 
tasks in NLP related to MT: e.g. we cannot really 
imagine doing summarization or question answering 
by purely statistical methods, can we? There is re- 
latzd practical evidence from message extraction: in 
the MUC competitions?, the systems that have done 
best have been hybrids of preference and statistics, 
such as those of Grishman and Lehnert, and not pure 
systems of either type. 

There is the related argument that we need access 
to -epresentations at some point to repair errors: this 
is пага to make precise but fixing errors makes no 
sense in the pure IBM paradigm; you just provide 
mcre data. One does not have to be a hard line 
syntactician to have a sense that rules do exist in 
some linguistic areas and can need fixing. 


Hzrd problems do not go away 
There remain, too, crucial classes of cases that seem 
to need symbolic inference: an old, self-serving, one 
wi | do such as ‘The soldiers fired at the women and 
I saw several fal". 

I simply cannot imagine how any serious statisti- 
cal method (e.g. not like ‘pronouns are usually male 
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so make ‘several’ in a gendered translation agree 
with soldiers’!) can get the translation of ‘several’ 
into a gendered language right (where we assume it 
must be the women who fall from general causality). 
But again, one must beware here, since presumably 
any phenomenon whatever will have statistically 
significant appearances and can be covered by some 
such function if the scale of the corpus is sufficiently 
large. This is a truism and goes as much for logical 
relations between sentences as for morphology. It 
does not follow that that truism leads to tractable 
statistics or data gathering. If there could be 75,000 
word long Markov chains, and not merely trigrams 
(which seem the realistic computational limit) the 
generation of whole novels would be trivial. It is just 
not practical to have greater-than three chains but we 
need to fight the point in principle as well! 

Or, consider the following example (due to Sergei 
Nirenburg): 

PRIEST IS CHARGED WITH POPE 

ATTACK (Lisbon, May 14) 

A Spanish priest was charged here today with 

attempting to murder the Pope. Juan 

Fernandez Krohn, aged 32, was arrested after 

a man armed with a bayonet approached the 

Pope while he was saying prayers at Fatima 

on Wednesday night. 

According to the police, Fernandez told the 
investigators today he trained for the past six 
months for the assault. He was alleged to have 
claimed the Pope ‘looked furious’ on hearing 
the priest’s criticism of his handling of the 
church’s affairs. If found guilty, the Spaniard 
faces a prison sentence of 15-20 years. 

(The Times 15 May 1982) 

The five italicized phrases all refer to the same 
man, a vital fact for a translator to know since some of 
those phrases could not be used in any literal manner 
in another language (e.g. ‘the Spaniard’ could not be 
translated word-for-word into Spanish or Russian), It 
is hard to imagine multiple identity of reference like 
that having any determinable statistical basis. 


Is the Pure Statistics argument what is being 
debated? No. | 

Everything so far refers to the pure statistics argu- 
ment, from which IBM have now effectively backed 
off. If the argument is then to be about the deploy- 
ment of hybrid systems and exactly what data to get 
from the further induction of rules and categories 
with statistical functions (e.g. what sort of dictionary 
to use) then there is really no serious argument at all, 
just a number of ongoing efforts with slightly differ- 
ing recipes. Less fun, but maybe more progress, and 
IBM are to be thanked for helping that shift. 


IBM as pioneers of data acquisition 

I can add a personal note there: when I worked on 

what I then called Preference Semantics" at 
_McCarthy’s Stanford AI Lab he always dealt briefly 
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with any attempt to introduce numerical methods 
into AI — statistical pattern-matching in machine 
vision was a constant irritation to him — by saying 
‘Where do all those numbers COME from?’ I felt a 
little guilty as Preference Semantics also required at 
least link counting. One could now say that IBM's 
revival of statistical methods has told us exactly 
where some of these numbers come from! But that 
certainly does not imply that the rules that express 
the numbers are therefore useless or superseded. 

This touches on a deep metaphysical point: I 
mentioned above that we may feel word-sense is a 
non-bilingual matter, and that we feel that there are 
rules that need fixing sometimes and so on. Clearly, 
not everyone feels this. But it is our culture of lan- 
guage study that tells us that rules, senses, metaphors, 
representations etc. are important and that we cannot 
imagine all that is a just cultural artefact. An analogy 
here would be Dennett's? recently restated theory of 
human consciousness that suggests that all our ex- 
planations of our actions, reason, motives, desires 
etc. as we articulate them may be no more than fluff 
on the underlying mechanisms that drive us. 

IBM's work induces the same terror in language 
theorists, AI researchers and linguists alike: all their 
dearly-held structures may be just fluff, a thing of 
schoolmen having no contact with the reality of 
language. Some of us in AI, long ago, had no such 
trouble imaging most linguistics was fluff, but do not 
want the same argument turned round on us, that ай 
symbolic structures may have the same status. 

Another way of looking at this is how much good 
IBM are doing us all: by showing us, among other 
things, that we have not spent enough time thinking 
about how to acquire, in as automatic a manner as 
possible, the lexicons and rule bases we use. This has 
been changing lately, even without IBM's influence, 
as can be seen from the large-scale lexical extraction 
movement of recent years. But IBM's current at- 
tempts to recapitulate, as it were, in the ontogeny of 
their system, much of the phylogeny of the AI spe- 
cies is a real criticism of how some of us have spent 
the last twenty years. 

We have not given enough attention to knowl- 
edge acquisition, and now they are doing it for us. I 
used to argue that Alers and computational linguists 
should not been seen as the white-coated laboratory 
assistants of linguistic theorists (as some linguists 
used to dream of using us). Similarly, we cannot wait 
for IBMers to do this dirty work for us while we go 
on theorizing. Their efforts should change how the 
rest of us proceed from now on. 


Let us declare victory and carry on working 
Relax, go on taking the medicine. Brown et al.'s 
retreat to incorporating symbolic structures shows 
the pure statistics hypothesis has failed. All we should 
all be haggling about now is how best to derive the 
symbolic structures we use, and will go on using, for 
machine translation. 
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InfoMapper revisited 


Letter to the Editor 





— letter to the Editor from Forest Woody Horton, Jr. 


Perhaps my most serious misgiving about Charles 
Oppenheim’s article in the February 94 issue of 
Aslib Proceedings is that he has approached the 
InfoMapper software with certain prejudicial and 
stereotyped concepts of: 1) what InfoMapper really 
is, and is supposed to accomplish; 2) what data base 
management systems are; 3) the difference between 
a computer programming language, such as ‘C’ or 
‘C+’ (in which InfoMapper was programmed), a 
generalized DBMS software package that is sold ‘off 
the shelf’, and in applications software package like 
InfoMapper; and 4) his failure to follow clear in- 
structions in the documentation accompanying 
InfoMapper (especially the Project Managers Guide) 
— which, in turn, led to all manner of problems that 
could have been entirely precluded, or at least amel- 
iorated, and for which he blames the software instead 
of adequate preparation, planning, and giving the 
client enough time to work through problems. 

One of the most serious errors Oppenheim made 
was at the very outset in not allowing sufficient time 
to prepare for, plan, and implement the inventory 
project. By his own admission he says ‘This research 
was undertaken over a three month period. Horton 
has recommended that the minimum length for this 
sort of study should be six months. Due to these time 
constraints, we only studied a proportion of the 
information needs and parameters outlined by TLC.’ 

Three months is totally inadequate to cope with 
the large number of preliminary steps that must be 
addressed in order to get ready to undertake an 
inventory of information resources. Oppenheim 
seems to have taken the position that time was the 
least critical variable in his research agenda. This 
was a Serious miscalculation, and calls into question 
many of his findings and conclusions. 

Another error his researcher committed was in- 
adequate manpower to support his test project. Again, 
by his own admission he says ‘Another of Horton’s 
specifications is for a project team to be allocated to 
various stages of the methodology.’ Oppenheim’s 
answer to a ‘team’ was one member of-his own staff 
plus one member of the client’s organization — 
woefully inadequate! He seems to think that the only 
negative side of that miscalculation was that he could 
have collected more data with more people! What an 
utterly naive assumption! There are far more impor- 
tant considerations that are dependent on adequate 
manpower staffing than the relatively minor ques- 
tion of data collection volume. ` 


On the other hand, one of the good things that 
apparently happened early in the project was during 
the interview stage, when Oppenheim says ‘the inter- 
views were based on the questionnaire (for data 
collection), but they moved towards open discussions 
about the information resources.’ This was a healthy, 
and constructive development, because one of 
InfoMapper’s underlying goals is to be provocative. 
That is, to deliberately get the organization to begin 
thinking consciously about their information resources 
— what are they, where are they, who uses them, who 
does not use them, who should use them, and so on. 

Answers to those questions are not *on the tip of 
the tongue,’ and they must be ferreted out. And, ina 
very real sense, getting the organization to focus on 
those questions is far more important than merely 
‘filling out another questionnaire.’ After all, con- 
structing a solid baseline of an organization's 
inventory resources for the first time is bound to be a 
difficult process at best — if for no other reason than 
it's never been done before! The Strathclyde team 
seems to think that filling out the inventory question- 
naire was some kind of bothersome and mundane 
survey that one needed to get over with quickly. 

Oppenheim complains at several points about the 
speed of generating reports, and carrying out other 
functions, despite the fact that he used a (fast) 486 
PC. There are several explanations for this. Typi- 
cally, Oppenheim jumps to the conclusion that 'it 
appears to be the fault of the software.’ But without 
further information it is impossible to even guess 
what his root cause of the speed problem may have 
been, or what combination of factors. 

First of all, despite clear admonitions in the docu- 
mentation, many of InfoMapper clients do not use a 
cache to speed up internal operations, even though a 
dBASE runtime cache is included in the InfoMapper 
files. That is probably the commonest reason why the 
application may run slowly. Secondly, a ‘fast 486’ 
does not tell what the RAM size was, or whether TSR 
programs may have caused the hardware to slow 
down the program because of an inordinate volume 
of file swapping between memory and the hard disk. 
Thirdly, many clients have never even done a 
'checkdisk/f' to see if they have lost files and clusters; 
they, too, can slow and frustrate optimal program 
execution. There are other reasons as well, but 
Oppenheim never contacted Information Manage- 
ment Press even once during this test to ask questions 
or seek advice. 
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- Oppenheim complains because none of the nine 
organizational class alternatives seemed ideally suited 
to TLC’s organizational situation, and he therefore 
had to do a lot of tinkering with the answer choices 
for. the seven user-defined fields in the record. 
He also points out that many of the terms had to be 
translated form ‘American English’ to ‘British Eng- 
lish.’ Indeed; the application was developed in the 
United States, but there were a number of United 
Kingdom organizations that participated in the beta 
test, as well as beta test participants in many other 
countries. i ; 

- Still, the problem, to some degree, is virtually 
unavoidable. That is a fair criticism. Two thoughts 
come to mind, however. One is that there is a tenth 
‘user-defined class’ which Oppenheim should have 
used, where he works up his own answer choices for 
the seven fields ‘from scratch.’ The other thought is 
that there are now available French, Spanish and 
German versions of InfoMapper. Perhaps there should 
be a British version as well? The developer would be 
interested in hearing from readers on this score. . 

Next, Oppenheim seems to have misunderstood 
the minimum field requirement. The only fields that 
are mandatory before a record can be created are the 
IRE Name, Number, and the information called for 
in field three on the first screen (e.g. whether the IRE 
is standardized or not, and whether it occurs in only 
one location or in many locations). In short, less than 
six fields! If you only fill out those six fields, the 
record will ‘take,’ and you need never answer any of 


the other questions! Oppenheim objects because un- 


used fields cannot be obliterated. 

Iconcur that this may be a useful consideration in 
a later release. Meanwhile, many clients devise their 
own questionnaire using InfoMapper's default ques- 
tionnaire as a guide, omitting fields they do not plan 
to use. 

The statement ‘it would take a full day to get a 
clear picture of all the IREs held by an interviewee' 
belies a fundamental misunderstanding of the soft- 
ware's ultimate objectives. I don't think a full day to 
getaclear picture is too long; on the contrary, I think 
it's much too short! Most of us simply have never sat 
down and asked ourselves ‘now, what information 
resources do J rely upon to do my job? Which are 
critical. Which are not so critical? Which are inter- 
nal? Which are external? What, indeed, are my 
information requirements.' To be sure, these are not 
simple questions! Who ever said they were? They 
are very difficult challenges, and it's InfoMapper's 
task to help users begin to think consciously about 
their information needs, instead of just assuming that 
information is something vague and amorphous. 

Oppenheim's suggestion that the user have greater 
control over the contents of the main entry report is 
an interesting one, and we will address that in consid- 
ering optional features for the next release. But 
Oppenheim errs in alleging that the user cannot 
select a single record to print out. The MER routine 
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is not the place to do that. The place to do that is using 
the ‘Print IRE Blank Form/Record’ feature and se- 
lect ‘record’ in the sub-menu instead of ‘blank form.’ 
That will allow you to select a single record and print 
it out. 

Oppenheim’s criticism of the search limitations 
of the software is taken constructively, and will be 
dealt with definitively in the next release. Mean- 
while, the LAN version allows the user to search 
three times as many fields as the standalone version. 
Also, some. clients have simply set up a simple 
module wherein they copy the entire database into a 
structured ASCII file, and search it that way using a 
specialized search package. 

Ме appreciate the positive comments the Strath- 
clyde team makes about the utility of the- Names, 
Organizations, Locations, and Hardware/Software 
indexes, and the Cross Reference Report. 

Oppenheim is correct in observing that Info- 
Mapper is not for the completely uninitiated user 
who is not computer literate. À certain, minimal 
level of computer (and information) literacy, while 
not absolutely mandatory, is strongly encouraged to 
save time and unnecessary effort. That is why the 
documentation includes an Jnstructors’ Manual, so 
that appropriate education and training can precede 
full implementation of the software. 

Weare particularly happy that Oppenheim should 
recommend that users first read the Burk-Horton text 
before proceeding with the inventory (it was origi- 
nally published by Prentice-Hall but is now available 
from Information Management Press). The book 
lays out the theory behind the information mapping 
idea and explains the basic concept in detail, and, 
thereby, gives the InfoMapper user a head start in 
understanding why certain features and functions 
operate the way they do, and what they are really 
expected to accomplish. 

But perhaps Oppenheim's most serious distor- 
tion is his failure to make the crucial distinction 
between: 1) a piece of application software such as 
InfoMapper; 2) a general-purpose DBMS package 
like dBASE; and 3) a programming language such as 
‘C.’ He seems to believe they are all functional and 
value-for-money equivalents — virtually interchange- 
able! He says a user could develop his own 
InfoMapper ‘with a little programming’ using a pack- 
age like dBASE! 

A ‘little programming?’ What a gross under- 
statement! InfoMapper contains over a half million 
lines of copyrighted C code! It would take the most 
expert C programmer, and one who is also intimately 
familiar with the dBASE software, nearly a year, and 
cost between $500,000 and $1 million to design and 
develop a comparable package and bring it to the 
beta test stage! Not to mention the testing and debug- 
ging time involved after that! 

I dare Professor Oppenheim to undertake that 
task and bring up a package of equivalent complexity 
in less time and with less money! 
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But the central point is: Why spend that kind of 
time and money when you can purchase a ‘shelf? 
package for around USD$600 and get something you 
can use immediately, albeit not a ‘perfect’ package? 

Would Oppenheim program his own word process- 
ing package from scratch just to accommodate some 
additional features he desires? Would he program his 
own spreadsheet application package, such as Lotus, 
for the same reason? Either course of action would be 
outrageously expensive and belies a total misunder- 
standing of the magnitude of the tasks involved. In 
short, there comes a marginal utility point when 
programming a job from scratch is simply not cost- 
effective to the alternative of buying shelf software, 
despite the fact that the shelf software.may not have 
100%, or even 80% of the ideal features sought. 

Atone point, for example, he says InfoMapper ‘is 
simply a version of dBASE IV but without the lat- 
ter’s searching capabilities and flexibility of output! 
That is also an utterly false and misleading state- 
ment. He does admit that ‘with an ordinary database 
software package such as dBASE IV, the user would 
have to do a lot of programming to get the user- 
friendly screens that InfoMapper provides.’ However, 

` at one point he does concede that using a DBMS such 
as dBASE IV ‘would result in a lower user interface 
standard than InfoMapper.' 
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Finally, I am very sad that it is the client who 
always ‘suffers’ in situations of this kind where a 
consultant or advisor simply does not do their home- 
work. Had Professor Oppenheim done his homework, 
he would have been forewarned, and thus forearmed 
with answers and informed choices as he confronted 
problems. Even if he had not done his homework, if 
he had contacted Information Management Press to 
discuss problems and plans, we might have helped 
him deal with them more effectively. 

` Itis my sincere hope that TLC will be able to deal 
effectively with its information management chal- 
lenges. Given a proper chance, I earnestly believe 
InfoMapper can be of major assistance. 

InfoMapper is the first product of its kind in the 
marketplace. It is currently the flagship product in its 
field, and has no cost-effective competition. All prod- 
ucts that dare to appear first on the market, especially 
software products, can always be improved. We at 
Information Management Press are working to im- 
prove InfoMapper, and expect to be able to add some 
very powerful new features in the next standalone 
release. To that end, we would appreciate feedback 
from users. Despite several very serious misleading 
statements, there is still some valuable feedback in 
Professor Oppenheim’s critique and, for those we are 
grateful. 


Aslib Proceedings, vol.46, no.4 


ONLY £18 | 


© Helps improve 
productivity — 
e Saves time 


@ Improves internal : 
. communications . 







9 Boosts morale. 


A. 


/ . € Reduces costs Бу. ч 
removing grey areas : 


|. © Identifies problems 
Ф Assesses efficiency 
. € Helps focus an organization 
on what it does and how  - 
pe well it is doing it 
Debbie Ellis and Bob Norton | 


For the public sector the 
impact of competitive 





e Whoever wishes to achieve | 


a и | | ` 
I very much welcome the and sustain the quality of 





tendering makes BS5750 an publication of this clear and ^ their service should buy this 
important guideline. concise guide to the bcok. . 
For the private sector added application of BS5750 to | ; : 
value To services libraries and information e Whoever is looking to р 
demonstrates commitment services. In the pursuit of provide confidence fo their 
fo customers. Checking quality it is important that own management that 
customer satisfaction is professionals make full use intended quality is being 
now a basic requirement. of the whole range of tools achieved and maintained 
“Formal recognition of ^ and techniques at their | should use this guide. 
reliability and quality is not command: Ellis and Norton's e Whoever aims to provide 
only desirable but in many book will make the BS5750 client satisfaction that the 
cases obligatory. With its : quality management intended quality is being and ` 
flowcharts and diagrams standar d far more accessible ^A will be delivered should 
this new book takes you to practitioners in our field’ familiarize themselves with 
through the steps necessary ~ Professor Peter Brophy, Librarian, | ` the detailed schedules and. 
to obtain the benchmark of checklists the authors have _ 





University of Central Lancashire c 
quality. Nee E ани НР, included. ч 


\ 


Implementing BS5750 ISO9000 in libraries UK and Europe: £18 (£14*) 


236 x 154mm; x, 123pp P ` Rest of the World: £22 (£18*) 


0 85142 315 9 paperback * Aslib Corporate Members 


For orders contact: 


=n 
р 
THE ASSOCIATION FOR INFORMATION MANAGEMENT: 


P 





INFORMATION House, 20-24 Oip STREET, LONDON ЕСТУ YAP. 
Tet: +(44) 71 253 4488 Fax: +(44) 71 430 0514 


e! "THE ASSOCIATION FOR INFORMATION MANAGEMENT | 





сл 
со 
a 
о 
со 
© 
ко 
сл 
Co 
5< 








OPE T 
^ Pt SIRVA 





MANAGEMENT PACKAGES FOR TRANSLATORS 
. FOREIGN LANGUAGES IN WORDPERFECT 
THE SUNDIAL PROJECT 








May 1994 








Aslib Proceedings carries papers given at Aslib meetings and 
conferences, and contributed papers of interest to practising 
professionals. These describe innovations in information management 
techniques, applications of existing information products and services, 
case studies of information units and special libraries. 


Aslib Proceedings is published ten times per annum and supplied free 
to corporate members of Aslib. For 1994 the annual subscription rates 
are £117 in the UK and Europe, £127 in the rest of the world. Single 
copy prices and subscription information are available from the 
Subscriptions Department at Aslib. Advertising information is available 
from Brian Thackray, Publications Department. Further information 
about Aslib’s services and membership rates will be supplied on 
application to the Membership Department, Aslib, The Association for 
Information Management, Information House, 20-24 Old Street, London 
ECIV 9AP. Telephone: +(44) 71 253 4488; Fax: +(44) 71 430 0514. 


Manuscripts and editorial enquiries should be addressed to the Editor, 
Moira Duncan. Manuscripts should be 2,000 words upwards, 
typewritten, double-spaced, single-sided on A4 paper. Figures should 
be reproducible. An abstract of up to 200 words should be included. 
References should follow British Standard 5605: 1990. Published 
authors will receive 10 copies of their paper. 


Aslib, The Association for Information Management has some 
two thousand corporate members worldwide. It actively promotes 
better management of information resources. 


Aslib lobbies on all aspects of the management of and legislation 
concerning information. It provides consultancy and information 
services, professional development training, specialist 
recruitment and publishes primary and secondary journals, 


conference proceedings, directories and monographs. 


Further informatian about Aslib can be obtained: from: 


Aslib, The Association for Information Management 
Information House 
20-24 Old Street 
LONDON EC1V 9AP 


Tel: +(44)71 253 4488 Fax: +(44) 71 4300514 Internet: aslib@demon.co.uk 





©1994 Aslib and contributors ISSN 0001 253X 
Printed in Great Britain by Chappell Graphics and Print Services 


Aslib — — 
PROCEEDINGS 


Contents 


Development of a management package for translators 
in translation management : 
Peter Barber 


Controlled English with and without machine translation 
Arthur Lee 


Foreign languages in WordPerfect 
Peter Kahrel 


The SUNDIAL speech understanding and dialogue project: 


results and implications for translation 
Norman M Fraser 


123 - 130 


131 - 133 


135 - 140 


141 - 148 








i ie | j S US 
Аз Proceedings, vol.46, no.5, Мау 1994:pp/121-148 © 7; 
TN E ^ е 24 * Po 


Development of a management package for translators in translation management 





Development of a management package for translators 
in translation management 


Peter Barber | 
60 Warren Way, Digswell, Welwyn, Herts AL6 ODN 


Paper presented at Machine Translation Today, a conference jointly sponsored by Aslib, the Association for 
Information Management, the Aslib Technical Translation Group, the Institute of Translation and Interpreting, and 
the European Association of Machine Translation on 18-19 November 1993, at the CBI Conference Centre, London. 


Abstract 

This paper examines some of the problems of day-to-day management and control of work passing through a busy 
translation office, aspects which are common to both translation companies and internal translation departments, 
such as maintenance of ‘client’ and ‘supplier’ databases, production of printed papers, statistics and control of 
‘work in progress’. It then passes on to consider some of the solutions available. One specific solution, which 
developed over a number of years in close collaboration with Bruce Carroll, a computer consultant, is the 
Electronic Translations Manager (ETM). ETM is a standalone or network computer program written by 
specialists for specialists, with the aim of minimising the repetition and routine drudgery of job and data handling. 
In addition to ‘job’ management, ETM manages the ‘client’ and ‘supplier’ databases, and merges data from all 
areas to provide instant information on the current and historical status of work. These data, when suitably 


merged, also provide a wealth of statistical reports. 


Introduction — generalities and definitions 

The first thing I need to do is to set the scene and the 
ground rules for the terminology I shall be using. 
Translation is a strange profession in that — apart 
from any other reasons that may come to your minds 
— the only thing that we have in common is the actual 
task of translation. We work with different source 
and target languages, in different directions, and 
with different subject fields. So it is with translation 
managers, people whose job it is to organize the 
doing of the work, keep track of clients and suppliers 
and their details, and provide the paperwork and 
statistical records that are generated by the passage 
of work through the system. 

Some of you work in translation departments, 
others are in translation companies. It is not impor- 
tant to make a distinction, for we all form part of a 
processing chain, receiving work from someone, 
giving it to someone to do, monitoring the progress 
of the work and from time to time producing data on 
what has been happening. 

There are various words used to signify the ‘work- 
giver’ (and that is one of them, a contrived generic 
term that betrays its German origin). Many of you, 
particularly those working within large companies 
or organizations, will want to talk of ‘requesters’ 
(this one comes from French), Some will prefer the 
term ‘customer’, but, exercising my right of privi- 
lege in choosing my own term, I shall refer to them 
all as ‘clients’. 

Similarly, there is a choice of term to identify the 
person or organization which carries out the work. In 
this context, since the work may involve many things 
other than translation, we shall call them ‘suppliers’. 


Running very quickly through the sequence of 
events, therefore, a client comes to you with a re- 
quest to carry out a ‘job’, the progress of which is 
recorded in a ‘job history’; the job may consist of a 
number of parts, which we shall call ‘tasks’ and 
within each task are different *worktypes', carried 
out by different types of 'supplier', who may be 
called upon to do their work in ‘batches’. 

I shall introduce other terms at the appropriate 
point in the presentation, again with a definition of 
how I am using it in this context. 

What we shall do now is take a closer look at the 
three essential areas of information that concern us, 
as translation managers. Incidentally, this term may 
benew to you, butthe concept is nota new one. Some 
three or four years ago, now, I promoted within the 
ITI the idea of introducing a new grade of member, 
the Translation Manager, with an Associate grade for 
the less-experienced. The criteria for membership 
were well-defined and, in fact, pretty strict — not an 
easy entry. The suggestion was enthusiastically en- 
dorsed by the Corporate Members of ITI, and by the 
Admissions Committee and Council, but rejected 
by, in my opinion, a short-sighted membership at the 
AGM when it was put to them for approval. How- 
ever, I digress, and have little enough time for the 
theme as it is. 

We shall look at the sort of detail we need to 
know about our clients and about our suppliers. 
We shall then pass on to look at the information 
needing to be recorded about each job that passes 
through and, finally, consider some of the ways in 
which the data can be used, both during and after 
the event. 
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The objective of the exercise is to show how 
readily the operation can be computerized, how much 
more efficient it can make our work, and I shall look, 
in very broad terms, at how this might be done. My 
own program, which is obviously at the back of all 
this thinking, will have a brief moment of exposure at 
the end, to show one way of implementing a solution. 
Putting it at the end means that I cannot be accused of 
blatant self-promotion since time will probably run out 
and I shall have to stop talking before I reach that point. 


Clients А 

The things we need to know about our clients are, of 
course specific to our needs, and this is where it starts 
to be a little difficult to generalize. I shall try to make 
my list as full as possible, without claiming to in- 
clude everything, and ask you to ignore any items 
that are not relevant to your own set-up. Also, I give 
the list in a reasonably logical order of importance, 
although some people may attach greater importance 
to certain things than I do; again, I ask you to look at 
the overall principle rather than argue about which is 
more important. 

My natural sequence is based on that of a transla- 
tion company; a translation department will have 
less need to know, for example, the company name 
of the client, although I know of a number of in- 
‘stances where, because the department also serves a 
number of subsidiary companies within a large 
organization, it is essential to know the client com- 
pany name for internal records and billing purposes. 

A typical record will contain: 

e company name, address, telephone number, fac- 
simile number, modem number 

e invoice address, if different from the work address 

e contact name or names, and job titles, with sepa- 
rate telephone numbers and fax numbers of each 
contact, if they have a direct-dial number or a local 
fax within their area, as distinct from the main 
company numbers. 

A. salesman will normally want to keep a lot of 
notes about each client, snippets of information that 
may one day come in useful, or just be good back- 
ground information (like names of children, hobbies 
and so on). For the most part, we have no need to 
keep that sort of detail, but, let us say, a memo field 
will be useful, for any non-compartmented data. 

Most clients will have specific standing instruc- 
tions about how their work is to be presented, how it 
is to be sent to them, how many copies, layout, 
typeface, and maybe even specific vocabulary. 

Some clients negotiate or are offered special 
rates; this needs to be noted, in order to be consistent 
and also to avoid loss of goodwill through failing to 
observe agreed rates. | 

This is all we generally need to record about our 
clients. However, there is one other aspect worth 
building into the record, and that is a history of work 
done for any given client; done or being done, so that 
you can go to the client record and find out what 
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work is in hand for them, or identify work done in the 

past in case there is a need to refer back at some 

future date. 

Apart from the recently introduced need to know 
the client’s VAT number, but only if you are dealing 
with clients located in another EEC country, only 
more two things need to be incorporated to make the 
record complete: 

e a unique ID number, to eliminate any problems of 
identity — no computer-based system could hope 
to provide meaningful data without a unique record 
identity 

e the date of last update, or when the record was 
created; apart from knowing how old the record is, 
it is also very useful in keeping track of price 
changes. 

Let us now move quickly on to Suppliers. 


Suppliers 

Not surprisingly, much of the information needed for 

clients applies also to suppliers: | 

® company name (not always relevant, although 
many individual freelance translators also have a 
business name) 

e supplier’s name, address, telephone, fax and mo- 

dem numbers 

what do they do? — are they translators, typeset- 

ters, couriers, and so on 

e what equipment do they have, in particular, details 
of their office hardware —.PC or not, what size 
diskettes, what software (and also what version), 
whether they have a fax or modem, the type of 
printer. There may be other items of specific inter- 
est, but these are the main ones 

e what is their normal target language (definition: 
the language into which they normally work. I 
refuse to get involved in debate over whether we 
sbould consider mother tongue or language of 
habitual use in this context — it is an argument 
worthy of 20 minutes at least, on its own account) 

e source language(s) — from which they normally 
work, preferably with some indication of level of 
competence 

e subject fields. This will be highly specific to your 
own operation, since you will not be interested in 
suppliers (meaning, for the most part, translators 
and, possibly, interpreters) and you will almost 
certainly have your own subject list. 

We shall need the usual comment area for all the 
useful bits of information we gather, or notes about 
our suppliers. Bear in mind, however, the need to 
register the existence of your database. In the UK, 
this has to be done with the Data Protection Regis- 
trar, and anybody listed in your database has a right 
to see their entry. Incidentally, prosecutions by the 
Registrar for failure to register very often come from 
checking up whether a request for details and forms 
has been followed by an application. 

More important than this, though, are two other 
areas: we listed *work done or in hand' under clients, 
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and this is even more important to include for our 
suppliers. 

I have deliberately saved the most important — in 
operational terms — until last, and that is the prices 
the supplier charges, bearing in mind that there will 
be a number of subsets of information, depending on 
the worktype (translation, revision, typesetting, art- 
work, and so on) and the language or language pair 
involved; there may be an influence as a result of 
subject or urgency. It is also important to note the 
basis (lines, words of source language, computer 
target language words, etc.) and the currency. 

For both supplier and client, it is essential to note 
the date of the record creation or change, particularly 
when it comes to rates. As an illustration of how 
useful this can be, I recall one translator telling me 
that he was increasing his charges — fair enough, one 
expects prices to increase periodically, and the amount 
was not significant or unacceptable in itself — until I 
looked at his record and the history of price increases: 
he had done the same thing twice in the previous six 
months, and back-tracked swiftly when I not only told 
him the dates but the previous and new rates. So, if 
history is important, do not delete or overwrite such 
data, but copy them to some form of archive record — 
the memo field could well be used for this, for 
example. 

So far, I would imagine that there is little over 
which we might have any difference of opinion. We 
now pass on to an imaginary situation in which a 
client places an order with us. 


Jobs 

Our client sends us a job, let us say it is a typical text 
to be translated from English into French, Italian, 
German and Spanish — a regular diet of FIGS, you 
might say, and a common requirement in the UK. 

What do we need to know about it? A series of 
questions: who? what? why? when? how? and maybe 
where? or whither? 

Who is the client and the client contact — together 
with the details of name, address and all the details 
we noted when looking at the client, earlier. 

In our system, this is when we first use our unique 
references to identify and create the particular job: 
client order number, our job number, client and 
supplier IDs. 

What is the job — describe the requirement, in this 
case a text into four languages, maybe a brochure for 
publication, with translation and typesetting. It may 
well have special conditions attached, like a type 
specification, or instructions to send the typescript to 
the client's local agents for approval before setting. 

We received it today. When is it wanted and how 
important is that date? It might simply be an agreed 
timing for no particular need, or it might be as 
specific as, for example ‘not later than 2 p.m. or I 
shall miss my plane’. In any case, bearing in mind 
that we are looking at computerizing the whole proc- 
ess, you cannot avoid setting some form of delivery 
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date, even if it is only an arbitrary internal date, 
otherwise you cannot do anything about progress 
reporting (how can a job be late in such a circum- 
stance?) 

To achieve the client’s required delivery, we 
must also consider when the job needs to be sent, and 
how it must be sent in order to arrive on time. 

Finally, at this level, have we quoted a price for 
doing the job? It is only too easy for this little bit of 
information to be forgotten at the end of the job, and 
I have no doubt that many of you have first-hand 
experience of jobs which have been done and in- 
voiced, totally forgetting the quotation. The result is 
a certain degree of annoyance on the part of the 
client, and justifiably so, because it is just as much a 
term of the contract as any delivery commitment and 
just as serious a breach as late delivery, although 
much more easily remedied. 

I said ‘at this level’ because this is all what I 
would describe as ‘top-level information’ relating to 
the job as a whole, and not to any specific part of it. 


Task 

The next level I defined earlier as a ‘task’. In this 
context, a task may be either one of a number of 
documerits, a series of which comprises a job, or one 
of a number of languages into which the document 
which constitutes the job has to be translated. Un- 
doubtedly, there are other ways of breaking the job 
process into levels or components, but this is one 
which has withstood the test of time and of practical 
application. 

In essence, not much detail needs to be controlled 
at this level, merely the task ID and the language pair 
or document ID, as applicable. As usual, it is advan- 
tageous to be able to put text-form notes or comments. 


Worktype 
The next level is the interesting one, the worktype, 
for it is the one where all the detail is entered: 
e translation 
e checking 
word processing 
proof reading 
correction 
checking-off corrections 
type mark-up ready for setting 
typesetting 
proof reading 
correction of type-setting galleys 
artwork, if appropriate 
courier, assuming, for the example, that a special 
courier is needed for delivery to client. It may even 
be that several couriers are needed, for delivery to 
and collection from the translator and typesetter, 
all of which need to be logged and accounted for in 
the final costing and scheduling. 

Each worktype has its supplier, who will have all 
the details we talked about earlier, and all of which, 
ideally, will be accessible. The prime information 
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needed, however, is that relating to the size and 
costing of the worktype, and the worktype itself. 


Batch/date level 

Below this, for each worktype, comes the schedule, 
the record of when the worktype is sent, when it is 
due, the degree of criticality of the delivery date, the 
method of return, and, finally, confirmation of the 
date on which the completed worktype is received. 

At this stage, I pause to say that I am sure there 
will be some among you who are thinking that all this 
is unnecessary for you, because you only work for 
departments within your own company and you only 
ever need to allocate work internally. In response, I 
would say that it is too detailed for your everyday 
needs, but this outline still contains all the elements 
that are essential record-keeping for you, too. As a 
simple illustration, consider the case of a staff trans- 
lator who is suddenly away from work. This person 
is not the tidiest of workers, and so his desk is a mess, 
files and papers all over the place. It may be that he 
has some urgent work somewhere in the jumble, but 
how to find it and how to tell? A set of records kept 
as I have described would save you even looking at 
the desk until you have consulted the schedule — is 
there anything desperate for delivery today, if so, 
what is the job number and who is the client? You 
can then go to the desk and look for a specific йе... 
or not bother at all if the log shows that there is 
nothing urgent there. So much surer and so much 
simpler. 

Now look at the job we have just described: one 
job, into four languages, each of which contains a 
number of stages — and until you sit down and 
analyze just how many steps there are in a job, you do 
not realize the complexity of the operation. If only 
we could all get our clients to understand that you 
don't just pass something through your translator — if 
you do think that, then surely you deserve what 
comes out the other end, for that is what you are 
likely to get! 

Some of those stages can be ignored, as they are 
loops, an interworking between two suppliers until, 
in flowchart terms, the work passes on to the next 
stage. Nonetheless, there will still be somewhere in 
the region of 20 — 30 different suppliers and 
worktypes active in the course of the one job. Then 
add, just for the sake of realism, the fact that no 
progress operation is likely to have only one job in 
hand at any given time. Highly reminiscent of the 
circus act in which the artiste keeps 24 plates spin- 
ning on the top of bamboo poles; it is no good being 
the juggler who boasts of being able to launch 24 
balls into the air at one time — he needs to be able to 
keep them there, or at least catch them safely on 
their way down. 

One thing becomes of paramount importance in 
all this, if we are to avoid chaos, and that is that every 
supplier, every client, every job and every compo- 
nent of every level of every job must without fail 
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haveaunique identity. It is simple enough to achieve, 
but needs a little careful forethought if problems are 
to be avoided. 

Let us now move on to using all these data. 


Reports, statistics, paperwork 

First, a definition: a report is a collection of data for 

a specific purpose. Ín this context, therefore, an 

address label is a ‘report’: it brings together and 

prints the title, initials and surname, together with 

the company name and address. A purchase order is 

another form of report, and so too is the job history, 

a record of all the activity within one job. Other 

reports may be more general, such as those which log 

the work in progress, which can be done in several 

ways: 

e job number order 

e due date order (these two are the most obviously 
useful ways) 

e client order (either account number or company 
name) : 

e or any other order you wish. 

If you — or your boss — need to know how many 
words have been translated this month from, let us 
say, German into English, it is a matter of seconds to 
produce a report which is valid and up to date at 
whatever instant you called for it. 

Once the information is in the system, it can be 
produced and presented in any way you like, as often 
as you like, and on screen, on paper, or as a file on 
disk, ready for export to another program. 

From practical experience, I can say that the 
‘reports’ function of the computerized management 
of your Progress function will enable you to make a 
significant saving in time and increase in your effi- 
ciency. First entry of the data is generally more 
time-consuming, for there is a lot of data for every 
part of each job, as we have seen. But it can be 
simplified and speeded up considerably, as you will 
see in a few minutes, I hope. 

A simple illustration should suffice to make the 
point: consider your present way of doing things, the 
records you keep of work passing through, and the 
paperwork you still have to generate. How many 
times in the course of a job do you write out the job 
number and the translator — or other supplier — name 
and address or similar information? Depending on 
your system, including the statistical reports of work 
done, probably between 5 and 10 times: job sheet, 
order form, address label, internal file slip, and so on. 
Similarly for the client. And each of the many 
worktypes needs to have the job number and descrip- 
tion, plus the specific-to-supplier instructions. 


Solving the problems — options 

Computers and databases have been around for quite 
a few years now, and I am sure that many of you will 
have spent either your own time or your company's 
internal resources in devising some form of database 
application that will do many, if not all, ofthe things 
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I have been talking about in this presentation. Cer- 
tainly, it is possible to arrive at some very elegant 
solutions via this route, subject to limitations on 
skill, time or money. 

There are also a number of project management 
packages available on the market, such as 
SuperProject and Project Manager WorkBench. Of 
those I have seen or about which I have read, they all 
are intended for project planning at a different level 
from our needs. They either do not provide certain 
functions that would be considered essential for our 
application, or provide them in amongst 50 other 
functions that would be of no relevance to us at all. I 
do not mean this in any way disparagingly, for they 
are excellent programs, but not for our purpose. 

For those who are fortunate enough to have in- 
company computer and programming experts, the 
problem is apparently easily solved. You give them a 
specification of what you want, and they produce it. 
If only! I know of three large organizations that have 
tried to go this route and their systems are still, let us 
be kind and say, not fully functional. Others have 
played about with databases and arrived at a simpli- 
fied system that covers some aspects ... adequately ... 
but leaves whole areas of work un(a)dressed, a bare 
system, you might say. 

Truly, it depends on the level of sophistication 
that will satisfy you. The problem is that ultimate 
satisfaction is always just around the next corner, 
after just this one more modification has been added, 
you know: ‘wouldn’t it be nice if... 

There is no doubt in my mind, after nearly two 
years of using one, that a computerized progress 
management system is indispensable, given the vol- 
ume and complexity of work that we are now having 
to handle. It is not the size of the job in terms of the 
number of words, but the number of volumes, docu- 
ments, languages, suppliers of all sorts, batches — in 
short, handling the multiplicity of components, all to 
the same level of detail and competence. This can 
only be achieved, in my opinion, by electronic means, 
for only in this way can one provide the detail — 
without the arduous and extremely time-consuming 
repetition of essential information. Surely, if you can 
save effort, it must make you more efficient. To give 
one last set of statistics, recorded when we were 
working on the theory ofthe program, we calculated 
that for a normal job to go through from receipt to 
despatch, there were 123 separate operations that 
had to be carried out within the Progress area; of 
these, forexample, 7 involved writing the translator's 
name, 5 were duplications of the language pair, a 
similar number of repetitions were needed for the 
client name, due date and so on, not to mention the 
paragraphs needed to produce the special instruc- 
tions for the job. With an electronic system, all this is 
done automatically, at the press of a key once the job 
has been compiled. Need I say more? 

It is perhaps interesting to note that the program 
Iuse - ETM — embodies all the concepts I have been 
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talking about and, for those of you who are seeking 
or have already obtained accreditation to ISO 9000 
or its national equivalents, the design and operation 
of the program has been described by a BS5750 lead 
assessor — a former BSI inspector — as being ‘materi- 
ally significant in addressing the requirements of the 
standard with regard to the establishing and main- 
taining of essential formal records and procedures'. 
It is central to my own operation and, frankly, it 
would be impossible for me to cope with my through- 


. put without it. 


ETM - history and approach to the problem 
The Electronic Translations Manager — ETM — first 
started as an idea of mine, the first notes on paper 
dating right back to 1983. The amount of duplicated 
effort in my office was horrifying, and life was 
consequently full of panic through workload, poten- 
tial crises only being averted by the competence of 
the people doing the job. Computers were back at the 
level of the dedicated mini-computer word proces- 
sor, and we were happily using Wordplex equipment. 
A heady moment when we thrilled to the power and 
speed of our first XT, compared with some of the 
things we had seen. And think of the speed of an XT 
compared with the 486/66 machines we have now, 
over 100 times as fast. 

The idea grew, on paper, with helpful and many 
critical comments from my staff until 1988, when we 
decided to write the program and install it. Bruce 
Carroll and I spent many late and happy hours putting 
thoughts on paper and testing out ideas and struc- 
tures. We were still finding that our wishes were 
ahead of the computing power of the programs and 
equipment available at the time. Nevertheless, we 
persevered and had a workable program ready to put 
on the market at the beginning of 1992. Then out 
came a new version of Clipper, the program in which 
ETM had been written, followed shortly after by two 
upgrade releases of Blinker, the compiler program 
that enables all the components to interact. 

These upgrades between them added delay to the 
successful introduction of the program, as we de- 
cidedto rework the program at the same time as it was 
upgraded with the new Clipper release. I am happy to 
say that we now have an extremely powerful and 
stable program, independently tested and approved 
by specialists. It has been in small-scale daily opera- 
tion now for nearly two years and, we feel, is capable 
of meeting the needs of a specialist market, regard- 
less of the size of the user. It is suitable for translation 
departments which only work for internal requesters, 
and which do all, or nearly all, their work in-house; it 
is equally suitable for the small or large translation 
company, with its span of clients, internal translators 
and external freelance resources. 

At the moment, it is a DOS program, but it runs 
daily within Windows and is also stable in OS/2. It 
works over a network and multiple users can have 
simultaneous access. We are looking to have a true 
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‘Windows’ version operational early in the New 
Year — the compiler has just recently been released, 
but we do not know what surprises it will hold for us. 

Much of the speed of performance and accuracy 
of entry is achieved through the extensive use of 
*picklists', database lists that are consulted by the 
program and the operator has to pick an entry from 
the list before passing on to the next box within the 


screen. This not only forces an answer, but ensures 
that it is consistently presented — none of the arbi- 
trary abbreviations for languages that we so often 
see, all different, and sometimes leading to confu- 
sion. Not only confusion, but how can one possibly 
automate any form of statistical reporting if the very 
language pairs are not written in such a way as to be 
picked up by the machine. 


Worktypes 


Translation 

Checking (revision, editing ) 
Wordprocessing 
Proofreading 

Correction 

Checking off corrections 


Type markup ready for setting 
Typesetting 

Proofreading 

Correction oftypesetting galleys 
Artwork, if appropriate 
Courier 

...and so on 





Database areas 
Client 
Supplier 


Job history 
Task 


Worktype 
Batch 








Job Sheet 
f š 





Job contact 
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Typical Reports 
Address labels 
Job history 
Purchase order 


Telephone list 


Work in progress 
By due date 
By client 
By language 
By supplier 


Client list 
(name, telephone/fax numbers, 
with or without address) 


Supplier list, similarly 


Language and subject lists 
(who does what) 


Statistical reports 
— content optional 





= History options 














[Job record] 
[Worktype level] 
(Date level] 
[Invoices] 





[Quit to main menu] 


p^" ЕТМ - Main Menu == 
Main data bases: 


[Jobs] 
[Clients] 
[Suppliers] 
[Invoices] 





Utilities: 


[Lookup lists] 
[Data utilities] 


Reports: 


(Standard reports] 
“Report generator] 





iQuit ETM program] 
о) 











Phone & Fax 


Quotation 
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Task details 























Worktype details 
ID Worktype 






Phone/Fax 









Units Measure 













Costs: Approved 


(ex VAT) 




















Batch details 
ID Date:Sent Due Urgency. | Method. Notes Received 












Supplier 





папе Last name 


Street 


Town 
County Printer 


Post Code 
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Rates (see memo for notes) 
Worktype i Rate Cur Unit Min? Level Updated 


















Jobs in progress ғ 
Job / Т/ ИР Worktype Units Measure Notes 












VAT codes 
ode Rate. 














Client details 
at 





Building/site 
Street 
Town/city 
County 
Postcode 
Country 
- Main «- Ph 

Fax 

Email 

Telex 

















Client contacts 
jtle I First & 














Client job summary 
ID. Ref Contact Order reference Description Date wanted |. 
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What is Controlled English? Its use and application. 
The rules of Controlled English and their meaning. 
Max — a Controlled English tool. 

The Controlled English dictionaries. 
Implementation of Controlled English within Bull. 
Guidelines for implementing a Controlled Language 
system with or without Machine Translation. 


Introduction 

Although I am going to talk about Controlled Eng- 

lish, in particular as used in Bull SA*, most of my 

presentation applies to any language. In fact Control- 
led French is being developed at Bull along very 
similar principles. Controlled English was introduced 
in Bull by ILO (Internationalization and Localiza- 
tion of the Offer). This department also introduced 

Machine Translation to Bull, thus Controlled Eng- 

lish and Machine Translation were, from the start, 

closely connected, although now the importance of 

Controlled English is seen to be much wider. 

I want to answer three questions. What is Con- 
trolled English? Where should we use it? Why should 
we use it? I shall also explain the development of 
Controlled English within my own company and the 
tules and techniques which we have developed. 

1. Controlled English is a set of grammar and style 
rules imposing simple structures on written English, 
plus a restricted vocabulary. 

. Controlled English is primarily for use in techni- 
cal documentation. This covers all fields: 
electronic, computing, medical, scientific, etc. It 
is not designed for literature, poetry, love letters or 
conference papers for linguists. 

3. There are several reasons why Controlled English 
should be used: 

® In an international marketplace, many users of 
documentation have English as their second, or 
even third, language. Technical documentation 
should therefore use simple grammatical struc- 
tures which are easy to understand. 

ө Controlled English provides a coherence of style 
and vocabulary throughout a manufacturer’s docu- 
mentation, thus avoiding jargon which may have 
different meanings in different documents or, in- 
versely, different terms for the same concept. 


t3 


Recently, particularly in the field of computers, 
technology has become available to a wider pub- 
lic. The end user is not necessarily a well educated, 
highly literate person and requires a simplified 
English in order to understand fully instructions 
given. Recently, I read an article in the French 
journal Communication et langages which dis- 
cussed user instructions for domestic appliances, 
pointing out that many washing machines now 
have fifteen programmes, but very few people use 
more than two because they do not understand the 
instructions. When they buy another machine they 
change manufacturer because they are not satis- 
fied with the whiteness of their wash! The use of a 
Controlled Language therefore has a clear long- 
term economic impact on the manufacturer of any 
product which includes instructions on its use. 

e Texts written in Controlled English are easier to 
translate by machine, owing to the lack of ambigu- 
ity, a simplified grammar and a known vocabulary. 

e Easier machine translation results in lower publi- 
cation and time-to-market costs. 

e Standard terminology throughout a manufactur- 
er's documentation makes the user's life easier 
and increases customer satisfaction. 

e Documentation is easier to update, again reducing 
time and cost. 

e Controlled English gives access to a wider global 
market. This leads to increased sales. 


The Bull controlled English rules 

Bull Controlled English has ten rules. These are: 

1. Make positive statements; avoid the passive voice; 
avoid the future tense. . 

2. Keep sentence length to a maximum of 25 words. 

3. Use valid terminology; do not invent it. Use the 
Controlled English vocabulary. 

4. One thought per sentence. 

5. Use simple sentence structures. 

6. Use parallel construction. 

7. Avoid conditional tenses. 

8. Avoid abbreviations and colloquialisms. 

9. Use correct punctuation. 

10. Use the tools available (Max, Grammar Checker, 
Spelling Checker). 
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What the rules mean 

The first rule on passive and future tenses is about a 
basic element of style in technical writing. In gen- 
eral, the Technical Author should be living in a 
permanent, active present. Rather than ‘an action is 
performed by the machine’, we use ‘the machine 
performs an action’. In using the future, we go into 
the realm of the unknown. The Author is not clair- 
voyant. When we say ‘the effect will be’, this depends 
on the product remaining stable. For an electric 
toaster, which never has user upgrades, this is fine, 
but for aircraft, computers, military systems etc. we 
do not know if what is described now will always be 
the case. One can say that this is rather pedantic, as 
product upgrades imply documentation revision, but 
the positive present tense does clarify for later Authors 
what the product actually does. 

I have often heard Technical Authors complain 
about the second rule. There is only one reason why 
sentences should be long, and that is when the Author 
does not understand the subject. It is extremely diffi- 
cult to hide a lack of knowledge in short sentences 
and there is quite an art in writing long, complex 
sentences which give the same choice of interpreta- 
tion that the Author has. There is also quite an art in 
translating such text so as to be equally vague in the 
target language. 

In any field where a Controlled Language is 
used, it is vital that the terminology is correctly 
defined before asking Technical Authors to use this 
writing technique. Without this preliminary work, 
the third rule is meaningless. There must be a 
technical and general dictionary supplied to the 
Author. I have come across terms which I could not 
find in a standard office dictionary. How is a user 
with English as his second language going to 
understand? 

The fourth rule on ‘one thought per sentence’ 
and, as I say in my training courses, at least one 
thought, closely linked to the next rule. It leads to 
simple sentence structures and sentences which are 
easy for the user to follow. 

The fifth rule on simple sentence structures allows 
for three basic structures: the statement, description 
or explanation, the step and action, and cause and 
effect. To take the example of getting a drink from a 
machine we can say, ‘Select the drink you want after 
putting the money in the machine. Take the cup when 
the machine indicates that the drink is ready.’ In 
Controlled English, we would say, ‘Put the money in 
the machine. Select your drink. When the machine 
indicates that the drink is ready, remove the cup.’ 
The Controlled English version follows a logical 
sequence of single steps. 

The sixth rule on parallel construction is obvious 
to many foreigners, but not to native English speak- 
ers. In everyday English we quite often change tense 
in mid-sentence. This is not possible in French, for 
example. Sentences with mixed tenses are difficult 
for Machine Translation systems. 
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The seventh rule on conditional tenses is particu- 
larly important for clarity and safety. The word 
‘should’ implies a choice. Imagine for a moment the 
effect of using ‘should’ throughout an aircraft main- 
tenance manual! Similarly, the word ‘may’ should be 
avoided. I recently saw ‘This may lead to unpredict- 
able results.’ So even the unpredictability was 
unpredictable. The Author should use ‘must’, ‘can’ 
and the present tense, for example ‘This leads to 
unpredictable results.’ 

Rule eight, on abbreviations and colloquialisms, 
concerns clarity of the text. Abbreviations often have 
several meanings. Of course it is not always possible 
to give the expansion of abbreviations throughout 
the text and maintain readability. Every document 
should have an appendix listing all the abbreviations 
used and their meaning. Colloquialisms such as ‘О.К.’ 
or ‘power up’ are vague. The Author should always 
be precise in values, conditions and actions. 

Rule nine is self evident. Incorrect punctuation 
can change the sense of a sentence to the reader and 
will often confuse a Machine Translation system. 

Rule ten is important for Machine Translation. 
The ideal is to use a Cortrolled English tool, ensur- 
ing that the vocabulary used is correct. If not, many 
grammar checkers will pick up the structural aspects 
of Controlled English. Spelling mistakes can lead to 
mistranslation by Machine Translation systems. 


Max - a Controlled English tool 

At Bull, we use a tool for implementing Controlled 
English where the text is to be translated. This tool is 
called Max, and was produced by Smart Communi- 
cations of New York. It comes in two forms: batch 
and interactive. 


What does Max do? 

Max applies the first nine rules of Bull Controlled 
English. It checks grammar, sentence structure, punc- 
tuation and vocabulary. Max uses three dictionaries 
to control vocabulary. 

The synonym dictionary. This is the reverse of a 
normal synonym dictionary. Whereas normally we 
go from one word to many words, the object of the 
Max synonym dictionary is to pick up a wide range 
of words and phrases with the same meaning and to 
propose a small number of words and phrases to 
replace them. The primary entries do not appear in 
the other dictionaries, but all the proposed alterna- 
tives appear in one of the other dictionaries. The 
basic synonym dictionary was provided by Smart 
Communications and refined by me for Bull's use. 
The general dictionary. This dictionary contains eve- 
ryday words of the English language. The general 
dictionary was supplied by Smart Communications 
and modified by me for Bulls use. 

The technical dictionary. Max is supplied with a 
range of technical dictionaries for various domains. 
However our technical dictionary used by Bull was 
generated entirely by ILO. 


Aslib Proceedings, vol.46, no.5 


Controlled English with and without machine translation 





Implementation of Bull controlled English 


The first stage 

Controlled English was first used in Bull in ILO. 
Max was used in batch mode for pre-editing text 
before translation. The resulting text was never in- 
tended to be read by human beings, the Controlled 
English produced being adapted specifically for the 
translation engine. 


The second stage 

The next stage was to implement Controlled English 
with the Technical Authors. By this time ILO had 
developed a multi-lingual terminology database. The 
foundation of this database was the English termi- 
nology, supplied with the codes necessary for Max. 
The translation of the database was then performed 
for our initial target languages (Dutch, French, Ger- 
man, Italian and Spanish). The database was used to 
produce the technical dictionary for Max and the 
dictionaries for the translation engine. Next we put 
the general dictionary into the database and created 
the entries for the target languages. If we change our 
translation system, we will be able to generate the 
technical and general dictionaries for the new system 
from our existing database. In this way, all technical 
terms and general vocabulary used by the Authors 
are known by the translation system. 


Technical Authors and Machine Translation. The 
Authors use the interactive version of Max. They 
write a paragraph, then ‘Max’ it. They are not obliged 
to follow the rules, but should rarely deviate from 
them. The text files arrive at ILO with the associated 
error files. Here, in the pre-editing stage, we can see 
if unknown terms have been used. In a fast moving 
industry new terminology is being created daily, and 
by spotting new terms immediately they are used, we 
can update our database and translation dictionaries, 
and reissue the Max technical dictionary. Obviously, 
not all terms used which are not found in the diction- 
aries are valid new terms. These will be modified at 
the pre-editing stage. If a particular Author is con- 
sistently failing to apply the rules and use the correct 
vocabulary, we can follow the problem up with the 
Documentation Department. 

Max provides a temporary dictionary. This can 
be used by the Author for terms which are not de- 
fined in Max, but which he uses frequently. We can 
use this temporary dictionary as input to update our 
terminology database. 


The third stage 

The final stage, which is currently being imple- 
mented, is Controlled English throughout the 
company. For the second stage (Technical Authors) 
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a manual was written explaining the rules of Control- 
led English and also providing guidelines on style. 
This manual, along with paper versions of the Max 
technical and general dictionaries, is available for 
everyone writing internal or user documentation, 
whether or not it is to be translated, The eventual aim 
is to provide Max to everyone. The advantage in 
using Controlled English for internal documentation 
is that many originators and readers of internal docu- 
mentation are not native English speakers. We are 
also looking into an English language editor designed 
specifically for French speakers. 


Guidelines for implementing a Controlled 
Language system with or without Machine 
Translation. 

These guidelines are based on the experience of Bull 
and will help you to implement a Controlled Lan- 
guage system which is easy to use and can evolve 
with your activities. 


Basic requirements 

The first step is decide on your rules, Controlled 
Languages systems follow more or less the rules I 
have described, with language specific variations 
and differences according to the method of working. 
For example, if the Author reads his translations, 
include the review of translated text with the original 
English, in order to gain experience of which forms 
are easier to translate. 

The second step is to create your technical and 
general dictionaries. You can have one single techni- 
cal dictionary or individual dictionaries for different 
domains. Controlled Language systems must have 
dictionaries with the controlled vocabulary. 


Machine Translation 

Make sure that your translation system knows the 
Controlled Language vocabulary. Our experience of 
building a terminology database starting with the 
source language (in our case, English) has proved 
invaluable in maintaining coherence at every stage in 
document production, from writing through pre-edi- 
tion, translation, post-edition and revision. Coherence 
of the Controlled Language and translation diction- 
aries saves time and money. 


Conclusion 

I hope that in this brief overview of Controlled 
Language, with specific reference to English, I have 
managed to demonstrate the advantages of Control- 
led Language systems, not only from an economic 
point of view, but also as a tool for easier communi- 
cation in our global village. 
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Abstract 

WordPerfect offers several facilities to handle foreign languages and multi-lingual documents. This paper discusses 
two aspects of language handling in WP: the language code, which is a WP formatting code that gives access to 
language modules and the keyboard editor, which facilitates entering foreign characters. The paper discusses the 
possibilities offered in the 5.1 version of the program. The last section discusses improvements in WP 6.0. 


Introduction 

WordPerfect (WP) offers a number of facilities to 
handle foreign languages and foreign characters. In 
this paper, I will concentrate on two of the major 
aspects of foreign language handling, namely the 
language code and its implications for hyphenation 
and spell checking, and the keyboard editor, which is 
essential for those who frequently need characters 
not represented on the keyboard. I will also show 
how to create a keyboard layout that is sensitive to 
the language code. Such a keyboard layout is con- 
venient for typing multilingual documents. 

The WP facilities discussed here and the tech- 
niques outlined to facilitate typing accented characters 
hold for WP version 5.1. Recently, however, a new 
version of the WP program has been released, WP 
6.0. Though the linguistic facilities of WP 6.0 are 
essentially the same as in WP 5.1, they are more 
sophisticated in some respects. But since the major- 
ity of the users will still be using WP 5.1, and since 
there were no language modules available at the time 
of writing, I will concentrate on version 5.1. Any 
relevant changes and improvements in WP 6.0 will 
be discussed in the last section. 

To designate keystrokes required in WP, I will 
use then following conventions. Two keystrokes 
separated with a hyphen mean that you press the first, 
hold it down and then press the second. For example, 
Shift-F8 means that you press the Shift key, hold it 
down, and then press the F8 key. Keys separated by 
a comma mean that you press the keys one after the 
other. For example, Home, Enter means that you 
should press the Home key, release it, then press the 
Enter key. 


The language code 

The basis for WP’s ability to handle foreign lan- 
guages is the language module. A language module 
consists of a word list which is used by the spell 
checker and hyphenation file which is -consulted 


when hyphenation is enabled. In many cases, a 
language module also includes a thesaurus (a 
dictionary that you can use to look up synonyms and 
antonyms), a keyboard driver to facilitate typing 
some special characters and a screen font to display 
characters not contained in the standard IBM character 
set. For example, the Hungarian language module 
includes a screen driver to display the 6 and the 6 . 
Each version of WP includes the language module of 
the package language. For example, the English 
version comes with the English language module 
and the German version with the German language 
module. Language modules can be bought 
separately. 

Within WP, a language module is accessed by 


using thelanguage code. This code determines which . 


language module WP will use after the point where it 
is inserted. For example, if you enter the French 
language code in a document, WP will use the French 
dictionary for spell checking the document and the 
French hyphenation module to hyphenate words 
from that point onwards. Also, when you spell check 
a document and you add words to the additional 


word list, these words will be added to the French 


word list. 
Like other WP codes, the language code is a 


` formatting code. You enter it as follows: press Shift- 


F8, 4, 5 (Format, Language, Other) and type the 
language abbreviation (for example, FR for French, 
UK for British English, US for American English). 
Then press Enter until you are back at the edit screen. 
Since the language code is a formatting code like any 
other code, it can be inserted into a document in any 
place and as frequently as is necessary. And you can 


_ enter as many different language codes as you have 


language modules. Thus, it can be used in an alter- 
nating French-English document, but also-in.. 


"EC-documents' which contain all EC languages: si А 8 


The spell checker and ће hyphenation module 
are well documented (see for ini кы A 
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will therefore not discuss them here, but rather move 
on to keyboards and typing accented characters. 


The keyboard 

Computer keyboards contain only a limited number 
of keys. Keyboards for use in the US and in Britain 
contain only the twenty-six base letters (i.e. unac- 
cented letters). In some countries, keyboards contain 
some accented characters; but no keyboard contains 
all accented letters. WP, in contrast, knows about 
1,900 characters distributed over 11 character sets, 
and by using the special Compose feature, all these 
characters can be typed with relative ease using the 
limited number of keys on the keyboard. Even char- 
acters not included in any character set can be typed 
using the Overstrike feature. Below I will discuss the 
notions character set, Compose key and Overstrike 
in detail. 


Character sets 
All the WP characters are contained in character sets. 
For example, character set 0 contains the ‘standard’ 
characters, i.e. the characters you also find on the 
keyboard. Character set 1 contains a number of float- 
ing accents (accents without a letter, such as 
7 ^ * &. |)and a large number of accented char- 
acters. Other character sets contain Greek, Hebrew, 
Arabic, Cyrillic, Japanese, typographical symbols 
and mathematical symbols. Overviews of the other 
character sets can be found in the WP manual and 
most books on WP. Kahrel! also discusses some 
inconsistencies in set 1 and how they can be solved. 
For the purposes of this paper I will concentrate on 
character sets 1 and 2. 

Each character is identified by the character set 
number and the position within the character set. For 
example, the a is character 95 in character set 1. By 
convention, characters are designated as set,number. 
Thus, the à is defined as character 1,95. In the 
remainder of this paper I will use this convention. 


Typing accented characters: the Compose key 
Using the Compose key, you can type any character 
by entering its character set number and its position. 
To activate the Compose key, press Ctrl-V. At the 
bottom of the screen you see the prompt Key -, at 
which youenterthe character's number. For example, 
totypethe а , press Ctrl-V and type 1,95 followed by 
Enter. In this way, any WP character can be entered. 
Entering characters in the above way is of course 
awkward, since nobody will be able to memorize 
each character’s number. WP therefore allows you-to 
use characters in the Compose key rather than num- 
bers. Normal letters you type as such, while a number 
of accents are represented by a convention. For ex- 
ample, at the Compose key, WP interprets the comma 
as the cedilla and the ^ as the circumflex accent. 
Thus, to type the c, press Ctrl-V and type ‚с followed 
by Enter. Andtotypethe ¥, press Сі1-У and type vr 
followed by Enter (the v designates the ~ , the hacek 
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accent). The order in which you type the accent and 
the character is immaterial. -7 

Table 1 lists which keys are recognized as ac- 
cents at the Compose key and which characters can 
be typed. The first column gives the keys represent- 
ing the accents in the second column. So you see that 
since the semicolon represents the ogonek (the Polish 
hook), you type ;a at the Compose key to enter 
the а. The table lists only lower case letters, but the | 
corresponding upper case letters can be entered as 
well: Ctrl-V ‘A enters the A. The last four lines in 
Table 1 show some other characters that can be 
entered using the Compose key. Thus, to enter the 2, 
press Ctrl-V and type ?? followed by Enter. 


Table 1. Accent designations in the Compose key. 


Accent WP characters 
acute аёїбаўёбЇ1й82 
grave 

circumflex 

tilde 

ogonek 

slash 

overdot 

centred dot 


cedilla 
corona 
umlaut 
hacek 
macron 





As you can see, you can enter the i by typing .i in 
the Compose key. However, this is not the ‘normal’ 
i, but the 1 with a dot (a dotted dotless i, so to speak). 
This is the Turkish i, and if you type Turkish you are 
advised to use it rather than the normal i. Firstly, to 
ensure that the 1 and the i are sorted correctly. In 
Turkish, the i follows the 1, but if you type the normal 
i, it will be sorted before the 1. Secondly, if you 
enable kerning, WP automatically creates ligatures 
like fi . It will do this for any sequence of fi, fl and, 
if you have expert fonts, ff, ffi and #1. but naturally, 
to distinguish the fi and the Їз combinations in Turk- 
ish, the Їз combination should not be turned into a 
ligature. | 

Another thing is that some accents are not avail- 
able in the Compose кеу. Notable examples are the 
Hungarian umlaut ( ^ ) and the breve accent (7) 
used, for example in Turkish. To type letters with 
these accents, you need to enter the numerical code 
in the Compose key, such as Ctrl-V 1,117 to enter 
the д. However, this can be remedied with a key 
macro, which I will discuss in the next section. (In 
WP 6 the u can be used to enter the breve accent in 
the Compose key.) 

Apart from the characters mentioned above, some 
other characters can be entered using mnemonic 
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keys in the Compose key. For completeness’ sake 
these characters are listed in Table 2. 


Table 2. Miscellaneous characters 


Keys Result Keys Result 
аж а >: 
ei о <= 





у 
£ 
f 
А 
€ 
4 
k 
я 
+ 
д 
4 


Overstrike 
Although the WP character sets contain almost all 
known accented characters, some are not included. 
Forexample, the Welsh Ù andsome Slovene accents 
are not in character set 1. You can however create 
any character yourself using the Overstrike feature. 
As its name suggests, the Overstrike feature prints 
two characters in the same position. 
Let us say you want to create the i .Todoso, go 
to the Overstrike feature: press Shift-F8, 4, 5, 1 
^ (Format, Other, Overstrike, Create). At the bottom of 
the screen you will now see the [Ovrstk] prompt and 
now you can enter the two characters. To create the 
# ,type "wand press Enter until you are back at the 
edit screen. In the edit screen you will see only the 
second character that you typed (in this case w). But 
if you activate the Reveal Codes screen, you see the 
Overstrike character displayed as [Overstk:" w]. 

However, you must be careful with accents that 
you type at the Overstrike prompt. The # is an 
interesting example, because if you enter it using the 
" key, it will be printed as & ! So at the Overstrike 
prompt, you cannot use the conventional characters 
listed in Table 1. Rather, you must use a floating 
accent from set 1. Now, the " is character 1,7. Soto 

. create the W correctly, do as follows: go to the 

` Overstrike prompt (Shift-F8, 4, 5, 1). Now press 
Ctrl-V to activate the Compose key and type 1,7 
followed by Enter. Finally, type the w and press 
Enter until you are back at the Edit screen. If you now 
look in the Reveal Codes screen, the character you 
just created is displayed as [Overstk: lw]. То see 
some more information, place the cursor on the 
Overstrike character and now you will see it dis- 
played as [Overstk: [8B:1,7]w]. 

The order in which you enter characters at the 
Overstrike prompt is not important. But since you 
will see only the second character of an overstrike 
pair, it is convenient to enter the accent first and then 
the letter. In the print preview (Shift-F7, 6) you can 
see how the characters will print. 
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The Overstrike feature is quite powerful, but it 
has some disadvantages. For example, words con- 
taining an Overstrike character are not sorted 
correctly; they are not added correctly to the supple- 
ment word list during spell checking; they are lost 
when you save the document as a DOS text file; and 
although you can search for the Overstrike code, you 
cannot search a particular Overstrike, nor can you do 
a find-and-replace in Overstrike characters. (The last 
point has been remedied in WP 6.) 

We may conclude that WP has some convenient 
features to enter accented characters and special 
symbols. Nevertheless, if you need to enter a limited 
number of characters very frequently, even the Com- 
pose key becomes awkward. But WP offers another 
facility to enter special characters conveniently, 
namely the customizable keyboard. This will be 
taken up in the next section. d 


The keyboard editor 

Most languages use only a few accented characters 
very frequently. It is then not very handy to enter 
them using the Compose key, since flexible as it may 
be, it does need a handful of keystrokes. To handle 
this inconvenience, you can assign any character to 
virtually any key or key combination. In this section 
I will show a number of ways that can be used to 
reconfigure the keyboard. 

Key assignments are in fact small macros that are 
assigned to particular keys. Indeed, the keyboard 
editor is identical to the macro editor. It is beyond the 
scope ofthis paper to explain the full operation ofthe 
keyboard editor; rather, I will assume knowledge of 
it. Most books on WP have a section on this subject; 
for example, my book on WP characters and 
languages contains all necessary background infor- 
mation!. Below I will make some suggestions for key 
assignments and discuss key macros that may make 
life simpler. 

The most obvious thing to do (and this is done very 
frequently) is to assign particular characters to par- 
ticular keystrokes. This is useful if you need certain 
characters often. For example, in Dutch only three 
accented characters are used frequently: ће ё, ë and ï. 
It would therefore be convenient to be able to enter 
these characters by pressing one key, let us say Ctrl-I 
to enter the i. This is easily done in the keyboard 
editor. (Note that in Ctrl-letter combinations, you can 
use only lower case letters. Thus, it is not possible to 
define Ctrl-i and Ctrl-I as two distinct keystrokes.) 

I mentioned that you can assign a macro of any 
complexity to a key for special purposes. Let me give 
a few examples. I will begin with some relatively 
simple examples, and finish with a rather more com- 
plex one. 

If you type words separated by a slash (such as 
man/woman), it would be convenient to insert a so- 
called invisible hyphen after the slash, so that this 
*word' is hyphenated correctly after the hyphen. The 
easiest way to accomplish this is to define the / key 
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such that when you press it, the invisible hyphen is 
inserted automatically. To do so, assign the follow- 
ing macro to the slash key: 

/{Home} {Enter} 
With this key assignment, you don’t have to think 
about inserting the invisible hyphen any more. 

The next example is useful if you type Portu- 
guese or Polish. These two languages share the rule 
that if a word that contains a hyphen is hyphenated at 
the end ofa line, the hyphen is doubled. For example, 
Polski-Fiat looks like this at the end of a line: Polski- 
-Fiat. In WP, you can make the hyphens behave 
correctly for Polish and Portuguese if you enter them 
as the combination of the soft hyphen and the hard 
hyphen. To have these two distinct hyphens inserted 
by pressing just the - key, assign the following macro 
to the - key in the keyboard editor: 

{Shy}{Home}- 
{Shy} stands for soft hyphen, which is the hyphen 
inserted by the hyphenation module; {Home}- are 
the keystrokes required to enter the hard hyphen, 
which is the hyphen that is always visible. 

With the next example I come back to my prom- 
ise to show how omissions in the Compose characters 
can be remedied. I mentioned that, contrary to what 
you would expect, the u does not enter the breve 
accent in the Compose key (in WP 6 it does). But it is 
not difficult to create your own Compose character. 
The following macro takes care of that: 


u 
{IF} {SYSTEM} 137=32790" 
{ELSE} 
{RETURN} 
{END IF} 


{CHAR}ch~ 
{Enter} 
{CASE}{VARIABLE}ch~ 


ACul-au2-Gru3"gu4s U uS uw u6 Y u7 y u8 
{RETURN} 


{LABEL}ul"{NTOK}1,98°{RETURN} 
{LABEL}u2~{NTOK} 1,91" {RETURN} 

{LABEL}u3“{NTOK} 1, | 16" (RETURN) 
(LABELju4 (NTOK]1,1 L7 (RETURN) 
{LABEL}uS*{NTOK} 1, 188° (RETURN) 
{LABEL}u6"{NTOK} 1, 189° {RETURN} 
{LABEL}u7"{NTOK} 1224 {RETURN} 
{LABEL}u8{NTOK} 1,225°{RETURN} 


With this macro assigned to the u key, the u behaves 
as the breve accent in the Compose key. Thus, you 
can type ug in the Compose key to enter the д. 

To conclude this section, and to link smart key- 
boards to the language code, I will give an example 
of a way to handle multilingual documents. Suppose 
that you use two languages: English and Russian. 
What you need is a keyboard that enables you to type 
English and Russian (in the Cyrillic alphabet) and a 
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key that inserts the correct language code. We'll start 
with the key that inserts the language code. Take the 
following macro: 

{DISPLAY OFF} 

{IF}"{SYSTEM}32“"="UK"" 

{Format}44RU {Enter} {Exit} 

{ELSE} 

{Format}44uk{Enter} {Exit} 

{END IF} 

What this macro does is the following. When acti- 
vated, it checks which language code is active (system 
variable 32 holds the current language code). If 
English is active, the Russian language code is in- 
serted (RU), and if it is not, the English code is 
inserted. It is convenient to assign this macro to a key 
that has no meaning ir: WP, such as the Alt-Enter 
combination. 

Now for the keys. It is possible to create a key- 
board in which each letier produces either a Latin or 
a Cyrillic character. We can do this by making each 
key sensitive to the language code. So, if the English 
language code is active, the d key should produce the 
d, and if the Russian language code is active, the д. 
Basically, this is a variant of the previous macro. 
Take the following macro: 

OF)"(SYSTEM)32"-"UK'" 

d 


(ELSE) 
A 


(END IF} 
Assign this macro to the d key in the keyboard driver. 
Thus defined, the d key behaves as follows: when 
pressed, first the current language code is deter- 
mined. If it is English, the d is inserted into the 
document, otherwise the д. Note that this key 
macro does not check for Russian. It assumes that if 
the UK code is not active you want Russian. This 
macro can therefore be used for other languages as 
well; just change the д to another character. 

Although the macro discussed here works fine, it 
has one shortcoming. If you are accustomed to using 
the mnemonic letters rather than the numbers while 
cruising the WP menus, you cannot use these mne- 
monics if the Russian language code is active. For 
example, you can go to the line margin menu by 
pressing Shift-F8,1,m. But if Russian is active, you 
would have to use Shifi-F8,1,7, since the 1 and them 
then produce Cyrillic, which WP does not under- 
stand. Further, if Russian is active, you cannot type a 
file name when saving or retrieving a document, 
answer y or n to a WP question, and so on. So apart 
from making the keys sensitive to the language code, 
we also want to make them sensitive to whether or 
not we are in the edit screen. The general format of 
such keys is as follows: 

(IFHSTATE)&4- 

(do something) 
(ELSE) 
(alternative) 
(END IF} 
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This general format is WP macro language for ‘if at 
the edit screen, do something, else do something 
else’. Edit screen here also includes headers, footers, 
endnotes and footnotes. | 
The thing to до now is to embed the macro for the 
d key in this general format. The macro to be as- 
signed to the d key will then look as follows: 
{IF}{STATE}&4~ 
UF)"(SYSTEM)32"-"UK'^ 
d 
{ELSE} 
д 
{END IF} 
{ELSE} 
d 


{END IF} 

To complete the keyboard, you should assign similar 
macros for each key. Fortunately, you can copy a 
macro from one key to another, which is convenient 
in our case. Do as follows: create the macro for the d 
key as described below, then go back to the edit 
screen to activate the keyboard layout with just this 
one d in it. Then go to the keyboard editor again. To 
copy the macro from the d to the i, type 1 (Create) in 
the keyboard editor and press Enter to enter the 
macro window. The i is in this window; delete it. 
Now press Ctrl-V and type d to copy the macro 
assigned to the d in the current window. Just replace 
thed withtheiandthe д withthe и and press Е7 
to save the changes. In this way it is not difficult to 
create a well-working bilingual keyboard. 


WordPerfect 6 

Recently, WordPerfect released WP version 6. In 
this new version various linguistic characteristics are 
more refined. On the whole, I think that for anyone 
using foreign languages, WP 6.0 is a great improve- 
ment. A rather drastic change is the ability to edit in 
a graphic screen, in which every character is dis- 
played correctly: screen font editors are a thing of the 
past. Secondly, WP includes a large number of printer 
fonts in the package that enable you to print all 
characters. Fonts are included in Type 1 and Speedo 
format, and WP also supports TrueType and Agfa 
Intellifont. With the included font installer fonts of 
these formats can be installed in the WP printer 
driver. Another big change is the macro language, 
which is completely new. 


Character sets. Most character sets have changed in 
some way. Linguistically, the following changes 
have been made. 

Character set 1 has been modified to correct some 
errors and to add some characters. WP 5.1 docu- 
ments are updated automatically when you retrieve 
them in version 6. For example, the dotless i (1) was 
1,24 in WP 5.1, but is 1,239 in WP 6; when you 
retrieve a 5.1 document in 6, the 1,24 code is changed 
to 1,239 automatically. 

Character set 2 is new and linguists will love it: it 
contains 144 phonetic characters. 
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Character set 8 (Greek): some minor changes. 

Character set 9 (Hebrew): this character set has 
been reorganized considerably. 

Character set 10 (Cyrillic) is now Cyrillic/Geor- 
gian. The Cyrillic has been slightly modified. It now 
includes Georgian as well. 

Character set 11 (Japanese). Drastically changed. 
In WP 5.1, this set contained the full Hiragana and 
Katakana sets; in WP 6, by contrast, it contains only 
62 Katakana characters. 

Character sets 13 and 14 are new: they contain 
Arabic and script Arabic characters. 


Compose key. Although mapped to another key (it is 
Ctrl-A, in WP 5.1 it was Ctrl-V), the Compose key 
works the same. The breve accent has been added as 
an accent that can be typed in the Compose key. You 
can use the letter u as a mnemonic. 


Character window. This is new in WP 6, but you may 
know it form WP 5.2 for Windows: press Ctrl-W to 
get an on-screen overview of all character sets. You 
can use this window to insert characters in the docu- 
ment. 


Overstrike. Using WP’s Search function, you can 
now search a particular Overstrike character. You 
can also do a find-and-replace in Overstrike: search 
one Overstrike character and replace it by another. In 
the text mode, only the last character is displayed, as 
in WP 5.1; in the graphic mode, all characters are 
displayed. 

Word lists. The supplement word list, which is cre- 
ated and updated when you add words to it during 
spell checking, is no longer a standard WP docu- 
ment. You must now use a special menu to edit it. An 
interesting feature is that you can define an auto- 
matic replacement in the word list. Suppose you 
want to change English to American spelling. You 
can include in the supplement list statements to the 
effect thatcenter should be changed tocentre, harbar 
to harbour, etc. Once these replacements are defined 
in the word list, they are automatically implemented 
during spell checking. 


Spell checker. The spell checker itself is essentially 
the same as the one in 5.1. But it is now possible to 
include codes to exclude parts of a document from 
spell checking. 


Grammar checking. WP 6 includes the Grammatik 
grammar checker, which is also included in 
WordPerfect and Word for Windows. 


Macros. Although WP contains the basics of a good 
linguistic word processor, it basically lives on the 
macro language to drive the keyboard. This was true 
in WP 5.1, and is still true in WP 6.0. It is therefore a 
relief that the macro language in WP 6.0 is much 
more powerful than the one in WP 5.1. The new 
macro language is basically Turbo Pascal with a bit 
of C notation. Anyone who knows Pascal can write 
WP macros without any effort; you only need to get 
used to a few notational variants. For example, con- 
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verting a Quicksort routine and a binary search func- 
tion from Turbo Pascal to WP was a matter of minutes! 
WordPerfect includes a program to convert 5.1 
macros to 6.0 format. Contrary to the 4.2 to 5.0/5.1 
converter, this program works very well. Even 
complex macros were converted successfully. 
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Abstract 


The ESPRIT SUNDIAL project ran for five years, concluding in August 1993. The objective of the project was to 
design and build telephone-access spoken language interfaces to computer databases. After introducing the aims 
and objectives of the project, the problems of specifying an interactive system are outlined and the Wizard-of-Oz 
simulation method described. The architecture of the resulting system is introduced, and system transaction 
success results of up to 96.6% are reported. In the final section, some implications for machine translation — 


particularly interpretive telephony — are identified. 


1, Introduction 

The ESPRIT SUNDIAL (Speech UNderstanding and 
DIALogue) project ran for five years, finishing in 
August 1993. The objective of the project was to 
design and build spoken language interfaces to com- 
puter databases, capable of supporting telephone 
access by members of the public. After introducing 
the aims and objectives of the project (section 2), the 
problems of specifying an interactive system are 
outlined and the solution adopted in SUNDIAL de- 
scribed (section 3). The architecture of the resulting 
system is introduced (section 4), and performance 
results reported (section 5). In the final section (sec- 
tion 6), some implications for machine translation 
and, particularly, interpretive telephony are identi- 
fied. 


2. Aims and objectives 

The SUNDIAL project aimed to make significant 
advances in the state-of-the-art in spoken language 
processing”. It did so by setting a very ambitious 
target, namely to produce computer systems capable 
of participating in natural spoken language task- 
oriented dialogues over the telephone, for each of 
English, French, German and Italian. The systems 
should be speaker-independent and should support 
large vocabulary (around 1,000 words) speech rec- 
ognition. The tasks chosen were flight information 
and reservations (English and French) and train time- 
table information (German and Italian). In order to 
achieve this goal it would be necessary to deliver 
respectable performance in each of the component 
technologies: signal processing, speech recognition, 
parsing, dialogue management, message generation 
and speech synthesis. Not only would these tech- 
nologies have to function well in their own right, 
they would also have to be fully integrated in a single 


system with all the other speech and language com- 
ponents and with an application database. 

This kind of approach is not without controversy. 
The very large DARPA ATIS program in speech 
understanding has placed primary emphasis on im- 
proving the component technologies and, in 
particular, on optimizing speech recognition?. By 
contrast, SUNDIAL's focus on integration reflects 
the belief that component technologies do not have 
to be optimal, so long as they are good enough to 
contribute positively to overall system usability. Af- 
ter all, humans are not capable of perfect speech 
recognition, but high level interpretive competence 
and effective use of heuristics (including asking a 
speaker to repeat an utterance) allow communication 
to proceed even in noisy environments. 

The objective of allowing telephone access to 
spoken dialogue systems has two principal 
motivations. First, the telephone has a very promis- 
ing future as an interface. Almost every home and 
office already has a telephone installed so it need not 
be necessary to buy any new equipment in order to 
access remote data and services. There is a large 
number of services in existence which allow touch- 
tone telephone access and a small but growing number 
of services which make use of limited speech recog- 
nition (e.g. home banking services). SUNDIAL's 
choice to target telephone quality speech is a clear 
endorsement of the view that speech recognition 
over the telephone will be one of the major technol- 
ogy growth areas towards the end of this century. 

A second reason for concentrating on telephone 
quality speech is that recognizing speech over the 
telephone is somewhat harder than recognizing speech 
directly using a microphone. The human speech 
signal occupies a frequency range of approximately 
0-10000 Hz, but telephone lines are limited to a 
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range of 400-3400Hz. Thus, a significant amount of 
potentially discriminative information is lost in tel- 
ephone quality speech. The added difficulty of 
working in this environment provides strong motiva- 
tion for improving the quality and robustness of 
dialogue management and overall system integra- 
tion. 

SUNDIAL ran from late 1988 to August 1993 
and involved 170 person years of effort. At that time, 
it was the largest speech and language project in 
Europe. The project involved partners from five 
European countries: Logica (latterly Vocalis) and 
the University of Surrey in the U.K.; CNET, CAP 
Gemeni Innovation and IRISA/University of Rennes 
in France; CSELT, Saritel and Politecnico di Torino 
in Italy; Daimler Benz, Siemens and the University 
of Erlangen in Germany; and Infovox in Sweden. 
Though the objective was to produce distinct dem- 
onstrators for the four languages, English, French, 
German and Italian, there was a strong commitment 
to converge on as many aspects of the technology as 
possible. In this way it was hoped that general les- 
sons would be learned about dialogue management 
which rose above the fine detail of any one of the 
actual languages investigated. 


3. Specifying the problem 

The fact that dialogue involves two parties, each of 
whose behaviour conditions the behaviour of the 
other, has serious consequences for the design of a 
spoken language system. There is no point in design- 
ing a computer dialogue system which takes no 
account of how users will behave when presented 
with it. Unfortunately, it is not possible reliably to 
predict how users will behave with such a system 
until it, or something very like it, exists. 

The approach adopted in SUNDIAL was to col- 
lect and analyze corpora of dialogues in which real 
users called existing (human) telephone services. 
For example, for the English system a corpus of 
telephone calls to British Airways’ (BA) flight infor- 
mation service was collected and examined. This 
was used to bootstrap a series of so-called Wizard- 
of-Oz simulations’ in which experimental subjects 
believed they were talking to a computer system; in 
reality, they were talking to a person (the ‘wizard’) 
whose voice had been filtered through a device to 
make it sound synthetic. 

In the first simulation, subjects were asked to 
carry out a set of tasks derived from the BA corpus; 
the wizard was required to use the actual words 
uttered by the BA agent whenever possible. Thus, 
the main factor being investigated was the effect of 
the user’s belief about the identity of the dialogue 
partner on the user’s utterances. It was discovered 
that when users believed they were talking to a 
computer they constrained their language signifi- 
cantly by comparison with the human-human 
language found in the BA corpus”. The constraints 
were broadly those found in ‘speech to foreigners’: 
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fewer words were uttered, a smaller vocabulary was 
used, there was less reliance on complex grammati- 
cal constructions (such as relative clauses), and the 
incidence of talk-in-overlap (i.e. both parties talking 
at the same time), common in the human-human 
condition, virtually disappeared. 

A series of simulations was carried out, with the 
lessons of each one feeding into the next. In the later 
simulations, some subcomponents of the real 
system were combined in ‘bionic Wizard-of-Oz’ 
simulations!6, This ‘iterative design’ methodology" 
made it possible to converge on a practical specifica- 
tion, sensitive to the needs of both system and users. 


4. System architecture 

The overall architecture developed is shown in Fig- 
ure 1. One of the objectives of the architecture was to 
provide a close coupling between each module with 
the aim of using appropriate knowledge and con- 
straints in the process of understanding users’ 
utterances and recovering from errors. An example 
of the interaction between modules is the application 
of predictions derived from the dialogue context to 
the recognition process", As the dialogue progresses, 
the Dialogue Manager is able to select appropriate 
modes of interaction (such as spelling when a place 
name is consistently mis-recognized), and prompt 
the user accordingly. 


Computer 
Information 
System 


Dialogue 
Management 


Message Language 
Generation Processing 


Text to Front End 
Speech Processing 





Figure 1: SUNDIAL system architecture 
Front end processing 


The Front End Processing module carries out acoustic- ` 
phonetic decoding of the incoming speech signal and 
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produces a lattice or graph of word hypotheses. For 
all four languages, front end processing is based on 
Hidden Markov Modelling of sub-word units. A 
small number of keywords (such as ‘yes’ and ‘no’) 
~are modelled separately to improve performance. 

Speaker independent sub-word models were con- 
structed on the basis of a number of large multi- 
speaker corpora of speech recorded over the telephone‘, 


Linguistic processing 

The formalisms used in SUNDIAL are Unification 
Categorial Grammar for English and French?, Aug- 
mented Phrase Structure Grammar for German”, and 
Dependency Grammar for Italian?. Two different 
parsing strategies have been used: left-to-right bot- 
tom-up parsing? and island parsing which selects 
starting points for parsing on the basis of the best 
acoustic scoring hypotheses’. 

Regardless of the parser type and the underlying 
formalism, all the linguistic processing systems deliver 
their results in а common format, using a semantic 
knowledge representation language called SIL". This 
has the effect of making the various different con- 
figurations of front end processing and linguistic 
processing indistinguishable from subsequent proc- 
esses in the chain of interpretation and generation. 


Dialogue management 

The input to the Dialogue Manager is a context- 
independent interpretation of an utterance, expressed 
in SIL. The Dialogue Manager takes this interpreta- 
tion and particularizes it to the current dialogue 
context. This includes finding precise referents for 
expressions such as ‘the flight’, and resolving pro- 
nominal references such as ‘it’ and diectic expressions 
such as ‘then’, Even the best of speech recognizers is 
not fool-proof, so the Dialogue Manager must be 
capable of establishing a reasonable degree of cer- 
tainty about what has been said by initiating 
confirmation sub-dialogues where appropriate. It 
must be capable of sorting out confusing or incon- 
sistent information, and it must do all this in a 
manner which is both painless and reasonable to the 
user. Speech understanding is a canonical example 
of reasoning in the face of extreme uncertainty; it is 
the task of the Dialogue Manager to co-ordinate the 
whole complex task of interpretation, and to estab- 
lish a reasonable degree of mutual confidence between 
the dialogue participants". 

Whereas each of the four language systems 
adopted a separate combination of technologies for 
SIL production, it was decided early in the project to 
converge on a single Dialogue Manager. This ge- 
neric system would encode universal knowledge 
about dialogue (e.g. ‘a question begs an answer’) and 
about major categories of task (e.g. providing infor- 
mation from a database on request). The generic 
system could then be customized to work with some 
particular task by supplying an appropriate 
customization knowledge base. The general princi- 
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ples of task management combined with the specific 
customization information, constituted the interface 
to the application database. 

The output of the Dialogue Manager is also a SIL 
structure encoding the propositional content and dis- 
course function of what the system ‘wants’ to say 
next. Thus, the only language the Dialogue Manager 
knows about is SIL. The system is capable of being 
simply customized in respect of dialogue strategy. 
Thus, whether the system typically behaves as in (1) 
or (2) can be determined when the system is being 
customized. 


1. (Explicit confirmation) 
System: Where does the flight leave from? 
User: Berlin. 
System: Was that from Berlin? 
User: Yes. 
System: What time of day does it leave? 


2. (Implicit confirmation) 
System: Where does the flight leave from? 
User: Berlin. 
System: From Berlin. What time of day does it 
leave? 


The generic Dialogue Manager was a very useful 
tool for investigating a wide variety of phenomena in 
a language-neutral, task-independent way. It also 
raised a number of valuable issues relevant to spoken 
language translation, as we shall see below. How- 
ever, the very flexibility of the generic Dialogue 
Manager, coupled with the software engineering chal- 
lenge of managing a piece of software written at sites 
in four countries, sometimes made it a difficult tool 
to work with. For this reason, a local variant of the 
generic Dialogue Manager was developed for each 
of the languages in the project. These local variants 
embodied the principal insights from the generic 
system in a framework specially tuned for the needs 
of the target application. 


Message generation 

The output from the Dialogue Manager was a SIL 
structure containing information about both the con- 
tent and function of the next system utterance. The 
task of the Message Generator was to take this infor- 
mation and turn it into a string of words in a specified 
natural language. Message generation in the context 
of a dialogue must be sensitive to what has been said 
previously by both system and user. The system 
maintained a detailed Linguistic History for this 
purpose”, As well as choosing an appropriate string 
of words, the Message Generator was required to 
select an appropriate intonational contour for its 
utterance. This was signalled by means of special 
markers in the string output by the Message 
Generator’, 


143 


The SUNDIAL speech understanding and dialogue project 





Speech synthesis 

Text-to-speech synthesis technology for each of the 
four languages in SUNDIAL was based on existing 
technology. Two different varieties of synthesizer 
were used, Diphone synthesizers splice together a 
sequence of very short stretches of human speech. 
Synthesizers of this type can display a уегу natural 
voice quality. Formant synthesizers generate the 
speech sounds artificially. This typically results in 
poorer voice quality but greater control over the 
intonation contour. A number of acceptability trials 
of the different synthesizers were conducted with 
non-expert subjects”. 


§. Evaluation results 

The whole question of how to evaluate the perform- 
ance of an interactive dialogue system has not yet 
been answered satisfactorily. It is simplistic to sup- 
pose that a single metric (such as a score between 0 
and 100) could be used. It seems much more promis- 
ing to suppose that a dialogue system could be 
characterized by an array of quantitative results cou- 
pled with a set of qualitative judgements on such 
aspects as usability and pleasaniness. This is the 
approach taken in SUNDIAL”. 

Evaluation metrics can be divided into two vari- 
eties. Black box metrics consider the performance of 
the whole system without reference to any of its 
internal details. Glass box metrics look inside the 
system and monitor the performance of the compo- 
nent technologies. Broadly speaking, glass box 
metrics are useful diagnostics during system devel- 
opment, while black box metrics are suitable for 
characterizing the ‘goodness’ of the system at achiev- 
ing its ultimate objectives. 


Glass box metrics 

Glass box metrics which have played an important 
role in SUNDIAL are word accuracy, sentence accu- 
racy and information content. Word accuracy is a 
measure of the speech recognizer’s ability to recog- 
nize the words spoken in each utterance. As well as 
measuring correct word recognitions, word inser- 
tions, deletions and substitutions are recorded. 
Sentence accuracy is a straightforward measure of 
the percentage of sentences which were recognized 
perfectly, i.e. without insertions, deletions or substi- 
tutions. It is a rather more demanding measure than 
word accuracy, since it is possible to obtain a high 
word accuracy but a low sentence accuracy by recog- 
nizing most, but not all, words in most utterances. 
The problem with sentence accuracy is that it does 
not measure recognition at a level of granularity 
which is relevant to the task in hand. 

Most people are familiar with the experience of 
missing part of what someone has said but, nonethe- 
less, grasping the ‘gist’ of the utterance. Consider the 
following utterance: 

3. I was wondering whether you might be able to 
tell me the arrival time of BA 123, please. 


144 


Suppose this was misheard as: 


4. I wonder whether you might be able to tell me the 
arrival time of BA 123, please. 


From almost every point of view, the differences 
between these two sentences are trivial. However, it 
is enough to drag down the word accuracy score and 
to assign sentence (4) a sentence accuracy score of 0. 
It is helpful here to identify how much of the sen- 
tence actually contributes to the achievement of the 


' task the user is attempting to perform. From this 


point of view, relatively few words in the utterance 
are important. Much of (3) could be ignored without 
loss of relevant information. Utterance (5) is suffi- 
cient to progress the dialogue satisfactorily. 


5. Tell me the arrival time of BA 123 


The information content measure is used to as- 
sess how effective the recognizer and parser are 
together in identifying the task parameters men- 
tioned in the utterance. Once again, the methodology 
is to compare what was actually recognized against а 
reference answer, and to count correct recognitions, 
insertions, deletions and substitutions. 


Black box metrics 

The most important black box metric is transaction 
success. This is a measure of whether or not the 
system succeeds in carrying out some appointed task 
and delivers a solution which accords with the facts. 
Possible values for this metric are (i) succeed, (ii) 
succeed with constraint relaxation, (iii) succeed in 
spotting that no answer exists, and (iv) fail. The fact 
that a system achieves a high degree of transaction 
success does not guarantee that it will be usable. It 
may be rendered useless by virtue of the slowness of 
its performance, for example. 

Average dialogue duration is another measure 
which is relevant for characterizing dialogue sys- 
tems. However, this misses the fact that some tasks 
genuinely require much more time than others. One 
way around this is to calculate the system's average 
response time. This conveys a sense of how long the 
user has to wait for the system to respond. This 
metric has to be appliec with caution, however, since 
some tasks necessarily involve lengthy delays. For 
example, callers to existing flight information and 
reservation services often have to wait a minute or 
more for a response while the agent queries a data- 
base. 

A more subtle metric is the turn correction ratio. 
This gives the percentage of users' turns in a corpus 
which are devoted to correcting some failure on the 
system's part. For example, the system may mishear 
or misunderstand the user. The insight behind this 
metric is that it expresses the percentage of the 
dialogue which is not devoted to progressing through 
the task. 

User confidence is a key factor in spoken dia- 
logue systems. One of the principal threats to user 
confidence is the production of system utterances 
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which appear to bear no relation to the rest of the 
dialogue or, worse still, contradict what has gone 
before. The contextual appropriateness measure 
records the percentage of utterances which are judged 
to be appropriate in context. 


Results 

Taken together, these metrics characterize the per- 
formance of spoken language dialogue systems. It 
would take more space than is available here to 
present the full results for all the SUNDIAL systems; 
these can be found in Ciaramella*. Here we present 
just the transaction success results for the Italian and 
English systems. 

Four trials of the Italian system were carried out’. 
The variables which were investigated were user 
expertise (naive/expert) and Dialogue Manager (ge- 
neric/Italian). All of the trials were carried out over 
private branch exchange (PBX) lines, i.e. lines con- 
trolled by a private switchboard. The results are 
reported in Table 1. 


Italian Trial Trial Trial Trial 
prototype 1 2 3 4 
Subjects naive naive | expert | expert 

(10m, 10f) | (10m, 10f) | (11m,4f) | (11m,4f) 
Speech input PBX PBX PBX PBX 
quality 
Dialogue Italian generic | Italian | generic 
Manager variant DM variant DM 
Number of 63 
dialogues 
Transaction 77.6% 51% 96.6% 83.3% 
success 


Table 1: Italian transaction success results over the 
PBX 








In the first and second trials, the subjects were 
naive users who were given tasks to perform. They 
were not given any spécial instructions about what 
they could or could not say. The transaction success 
results show significantly better performance from 
the Italian variant of the Dialogue Manager, reflect- 
ing the additional constraints it embodied. Trials 3 
and 4 also contrast the different versions of the 
Dialogue Manager, this time using members of the 
Italian SUNDIAL project team. Once again, the Ital- 
ian variant Dialogue Manager produces better results 
than the generic Dialogue Manager. It is noteworthy 
that the system performs significantly better with 
expert users than with naive users for both versions 
of the Dialogue Manager. 

How should these results be interpreted? Trial 
1 shows that more than three out of every four tasks 
attempted by naive users completes successfully. 
When the system does not succeed, it does not fail 
drastically. Rather, it recognizes that it is not making 
reasonable progress and elects to pass the caller to a 
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human agent for task completion. The transaction 
success result in Trial 3 represents a very respectable 
target towards which users may aspire as they be- 
come more skilled in using it (more than 19 out of 
every 20 dialogues succeed). The nature of the skills 
acquired by experienced users is difficult to pin 
down exactly, but is likely to include the tacit knowl- 
edge absorbed by the user about how to speak in 
order to get the most out of the speech recognizer, 
and which grammatical constructions produce the 
best results. 

A clear lesson from these trials of the Italian 
prototype system is that over-the-telephone task- 
oriented natural language dialogue for access to data 
and services is technically realizable within the short 
to medium term. 

A number of trails of the English system (includ- 
ing the English variant Dialogue Manager) were 
carried out. Each trial was only of modest size, and 
was designed to investigate particular questions. 
Though the results shown in Table 2 are for small 
samples, and represent the best results achieved so 
far, they are nonetheless broadly in keeping with a 
much larger set of results drawn from a number of 
other trials carried out under similar but not identical 
conditions. 


English prototype Trial 1 
staff (5 male) naive (6 male) 
25 












Dialogue Manager English variant | English variant 


Table 2: English transaction success results over the 
PBX 


The subjects in Trial 1 were staff of Vocalis, all of 
whom were familiar with the idea of speech recogni- 
tion and most of whom had experience of using a 
speech recognizer. However, none of the subjects 
had contributed to the design or implementation of 
the SUNDIAL systems. These subjects therefore 
represent an intermediate group between the naive 
users and expert users of the Italian trials. Trial 2 
investigated the use of the English SUNDIAL sys- 
tem over the public switched telephone network 
(PSTN); to be specific, the subjects were located in 
Guildford and the system was located in Cambridge 
(a distance of more than 100 miles). The diminished 
performance in Trial 2 may be attributable in part to 
the difference between staff subjects and naive sub- 
jects. However, the weakening and distorting effects 
of the PSTN on the speech signal is almost certainly 
a contributory factor. 
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There is clearly room for improvement in the 
result recorded in Trial 2. However, SUNDIAL is 
one of the first projects to report results for spoken 
language (as opposed to isolated word or connected 
digit) recognition over the public telephone network. 
The transaction success figure of 70% is an encour- 
aging starting point for further work. Once again, 
transaction failure does not mean that the user is 
supplied with incorrect information or that the sys- 
tem hangs up. Rather, it means that the system 
recognizes that there are difficulties and switches the 
caller to an agent, rather than subjecting the user to 
the pain of a long drawn-out error correction process. 
Further information on the English SUNDIAL sys- 
tem trial results can be found in Ciaramella’. Results 
for the French system are reported in Magaduret al” 
and for the German system in Eckert et ai*. 


Example dialogue 

Perhaps the best way to get a feel for the capabilities 
of a dialogue system is to analyse the dialogues it 
produces. The following is an example of a real 
dialogue involving a naive user calling the system on 
a long distance telephone line. 


1. System: Welcome to British Airways flight 
information service. How can I help 
you? 

Which terminal does BA five eight 

zero leave from? 

3. System: Was that British Airways flight five 
eight zero? 

. User: Yes. 

5. System: That flight departs from Heathrow 
Airport Terminal 1. Do you have an- 
other inquiry? 

. User: Yes. 
7. System: What would you like to know? 


8. User: 


2. User: 


Can you tell me the arrival time of 
today’s Madrid flight? 


9. System: What is the number of the flight? 
10. User: I don’t know. 

11. System: Where does the flight leave from? 
12. User: Madrid. 

13. System: Was that from Paris? 

14. User: No, Madrid. 

15. System: Did you say from Copenhagen? 
16. User: Madrid. 


17. System: Please answer ‘yes’ or ‘пс’. Did you 
say from Madrid? 


18. User: Yes. 

19. System: Please tell me the destination. 

20. User: Heathrow. 

21. System: Did you say to Heathrow Airport? 
22. User: Yes. 
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23. System: Please tell me the departure time. 
24. User: One o’clock in the afternoon. 

25. System: Was that thirteen hundred hours? 
26. User: Yes. 


27. System: British Airways flight four six seven 
leaves from Madrid Barajas Airport 
at thirteen hundred hours. It arrives at 

London Heathrow Airport, Terminal 
1, at fourteen fifteen. Do you have 
another inquiry? 


28. User: No thanks. 
29. System: Thank you for calling. Goodbye. 


5. Implications for translation 

The SUNDIAL project set out to investigate spoken 
natural language dialogue by designing and imple- 
menting a prototype system for each of the four 
project languages. It was successful, both in produc- 
ing working large-vocabulary speaker-independent 
systems and in bringing to light some important 
lessons about where effort can most effectively be 
directed to produce the best results. 

Though the work was carried out in a multilin- 
gual environment, it was never an intention to 
investigate interpretation, that is, dialogue between 
partners speaking different languages. However, the 
results of the SUNDIAL project bear on the whole 
question of inter-language communication in a 
number of ways. 

First, the project has shown that the enabling 
technology for interpretive telephony (i.e. spoken 
language translation over the telephone) is maturing 
to the point of being a practical possibility for limited 
domains. This observation is in keeping with recent 
results coming from the leading groups working on 
interpretive telephony around the world!'9?s, How- 
ever, encouraging though this result is, it is important 
to stress that it only holds good for strictly limited 
domains. The idea of а general purpose interpreting 
machine which will cause mass unemployment 
amongst professional interpreters, is still a very long 
way from being realized. 

Second, though SUNDIAL did not set out to 
produce an interpretive telephony system, it ended 
up producing one by accident! The same Dialogue 
Manager was the core of each of the language proto- 
types and the interfaces were constant across all 
systems. The net result was that system components 
could be mixed and matched. A first trial might 
connect an English input subsystem (Front End Proc- 
essor plus Linguistic Processor) and an English output 
subsystem (Message Generator plus speech synthe- 
sizer) to the generic Dialogue Manager. A second 
trial might, with equal ease, include an English input 
subsystem and a German output subsystem with the 
Dialogue Manager. This would result in the system 
understanding English but speaking German. So easy 
was this to do, that project workers occasionally 
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produced this effect by mistake. Once the clean 
modular structure to support this kind of behaviour 
was in place, it was a relatively simple matter to 
modify the system so that, rather than responding to 
the user’s questions, it simply echoed them in a 
different language. Though it was never designed as 
such, SIL turned out to be a very effective interlingua. 

This leads to a third observation, which is little 
more than a generalization of the previous one. It is 
widely acknowledged that machine translation is a 
very difficult problem, There has been much, often 
heated, debate between those who advocate surface- 
oriented transfer and those who support translation 
based on a deeper (and consequently more language- 
neutral) representation of the meaning of a text. 
There are strong arguments on both sides, but per- 
haps the increasing amount of work being carried out 
on language understanding for its own sake (as in 
SUNDIAL) will begin to tip the balance in favour of 
interlingua approaches. The experience of the SUN- 
DIAL project was that machine translation was rela- 
tively straightforward in limited domains, once the 
problem of language understanding had been solved. 

A final observation is that task-oriented transla- 
tion is much simpler than general purpose translation. 
This is, of course, platitudinous. However, the obser- 
vation does not just apply at the level of tasks such as 
‘making a train timetable enquiry’; it also holds for 
the tasks which make up the micro-structures of 
dialogue. Thus, if speaker A has just asked speaker B 
which London railway station trains from York ar- 
tive at, speaker B’s range of reasonable responses is 
strictly limited to identifying a London railway 
station, admitting to not knowing the answer, or 
asking for the question to be repeated. 

If translation of unrestricted discourse is diffi- 
cult, then translation of task-oriented discourse is 
easier and translation of task-oriented interactive 
dialogue is easiest of all, given present technology 
limitations. 
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Abstract 

The concept of the virtual reality library is introduced and defined as a new form of OPAC. Since a desktop virtual 
reality package is needed to construct a virtual reality library the expected functionality of such software is 
discussed in general terms. One such desktop virtual reality package, REND386, is then discussed in detail and 


used to build a working prototype of a virtual reality library. 


Introduction 

In a previous paper! I described in detail what I 

termed the virtual reality library which I defined as: 
the bibliographic data from a typical library 
collection... being accessed via an interface 
which looks, on screen, like a roomful of 
shelves, and which a user would navigate and 
control using a device like a 3D mouse. 

The virtual reality library in essence is a compu- 
terization of the physical structures libraries use to 
order the information resources they contain: floors, 
‘rooms and shelves. I have opened with a definition 
because information professionals are more used to 
the similar sounding phrasevirtual library as a short- 
hand for the concept of a user-accessible network of 
computer-based information resources. І want to 
state that the virtual reality library is not the same 
thing as a virtual library. The virtual reality library is 
in function a new form of OPAC, built using virtual 
reality technology. Oppenheim? gives a solid ac- 
count of this distinction. 

Why attempt to build a new form of OPAC 
using such expensive and arcane technology? The 
virtual reality library style of OPAC potentially fits 
more comfortably with user expectations about how 
to search for information than the current generation 
of query-based OPACs. Rather than have to frame 
queries, users would browse the virtual reality 
library. Browsing depends solely on a person's 
innate ability to recognize some intrinsic ordering. 
It is this ordering which makes browsing a viable 
strategy. Because similar items are stored together 
browsing involves looking for one relevant item, on 
the assumption that its neighbours might also be 
relevant. While shelf orders can be understood by 
common sense reasoning, working on the example 
` ‘of ordered items in sight, no such strategy exists for 
the current generation of OPACs. With them the 
onus is on the user to build up a conceptual model in 
his or her mind of how such an OPAC works. The 
chief advantage of a virtual reality library then is 
that the model of the collection it presents is overt 
from the start. 


Of course, in the majority of libraries, there is no 
need for an OPAC which presents a browsable repre- 
sentation of a physical library on a computer screen. 
If a user does not want to use the OPAC they can 
browse the nearby shelf-based library collection for 
themselves. But if the ‘library collection’ is actually 
an online database or a CD-ROM, or if the actual 
library is physically distant, or if the library is closed 
access, then users have no shelf-browsing opportuni- 
ties. It is in these situations that the virtual reality 
library style of OPAC would be useful. 

Other writers*4 have proposed this idea. Last 
year for the first time, virtual reality technology, and 
its potential for application in information work, was 
been given a chapter in the Annual Review of Infor- 
mation Science and Technology*. While perhaps the 
time has come for the idea ofa virtual reality library, 
one might still ask is the technology up to it? There is 
a need to dispel the notion that virtual reality is 
expensive and *high-tech', preferably by the appear- 
ance ofa cheap and easy to use virtual reality software 
package. 

In this paper I wish to implement as many of the 
ideas as possible which I outlined in my previous 
paper!. Thus I want to cover the functions ofa virtual 
reality package, both in theory and in practice, to 
show what can and cannot be done with the technol- 
ogy. I then will describe my initial implementation 
of a virtual reality library, using a virtual reality 
package called REND386. 


What does a virtual reality software 
package do? 
Virtual reality software is not commonly available. 
As such its functionality is little known, compared to 
packages like word processors or spreadsheets. A 
virtual reality package is essentially a visualizer, a 
builder of a virtual world on a computer, which 
people can view and interact with. I quote a defini- 
tion of virtual reality: 
Virtual reality is a way for humans to visual- 
ize, manipulate and interact with computers 
and extremely complex data. 
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Visualization is not just the generation of visual 
images, but also auditory or other sensual inputs. The 
concept of the virtual world goes back to the early 
days of computer graphics, In 1965, Ivan Sutherland 
laid out a research programme for computer graphics 
in a paper called “The ultimate display" that has 
guided research ever since. The ultimate goal is to 
make the virtual world look real, sound real and the 
objects act real. This is not possible now, and is 
likely to remain impossible for the foreseeable fu- 
ture. 

Some current systems use an ordinary computer 
screen to display the virtual world. This is a desktop 
virtual reality or a ‘Window on a World’ (WoW) 
system. Other systems use special devices (e.g. head- 
sets and data gloves) to let users experience the 
virtual world. These are known as immersive virtual 
reality systems. Since all that is required for a virtual 
reality library is a desktop virtual reality system, I 
will concentrate on describing the functions of virtual 
reality software in this type of system. 

There are four basic functional parts of a virtual 
reality program: 

ө a world database 

ө a rendering process 
ө an input process 

ө a simulation process 

The rendering process is responsible for drawing 
the virtual world, based on information contained 
within the world database. The simulation process 
continually informs the renderer about the user’s 
actions, which it obtains from the input process. This 
enables the renderer to redraw the image of the 
virtual world when the user changes his or her point 
of view, for example. 

The critical factor here is the time required for 
processing. Every delay in response time degrades 
the user’s feeling of presence in, and the reality of, 
the virtual world. The latency of a virtual reality 
system, the time it takes to respond to user input, is a 
vital measure of its power. 

The virtual world itself needs to be defined in a 
world database. By its nature as a computer simula- 
tion, this world is necessarily limited. The computer 
must put a numeric value on the locations of each 
point of each object within the world. Usually these 
coordinates are expressed in Cartesian dimensions of 
X, Y and Z (length, height, depth). 

A limitation on virtual worlds is the type of 
numbers used for coordinates. Using floating point 
coordinates allows a very large range of numbers to 
be specified. Using fixed point coordinates provides 
uniform precision on a more limited range of values. 
One method of dealing with this limitation on the 
virtual world coordinate space is to divide a virtual 
world up into multiple virtual worlds and provide a 
means of moving between different virtual worlds. 

The storage of details about objects in the virtual 
world is the primary component ofa world database. 
Information is also stored on actions those objects 
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can perform, if activated by the user, and on lighting 
and light sources in the virtual world. In computer 
terms the world database could be one large file. 
More typically though, a world file contains general 
information about the virtual world, while many 
separate object files contain details about individual 
objects in that virtual world. World and object files 
are most often stored as ASCII text files. These can 
be replaced by binary files. In some cases the world 
information compiled directly into the program. 

An object file defines an object. Some programs 
provide only primitive objects, such as cubes, cones, 
and spheres. Sometimes, these objects can be slightly 
altered by the program to provide more interesting 
objects. The most common method is to build ob- 
jects from polygons. A polygon is a planar, closed, 
multi-sided figure. In an object file each polygon that 
comprises an object is defined by the set of coordi- 
nates that make up its vertices. The use of polygons 
often gives objects a faceted look. It is also difficult 
to draw objects with curved surfaces. Text (i.e. char-. 
acters) is, as a result, very difficult to add to a virtual 
world. Some programs use simple triangles or quad- 
rilaterals instead of more general polygons. This can 
simplify object drawing, as all surfaces have a known 
shape. However, it can also increase the number of 
surfaces that need to be drawn. 

Each polygon also has a colour attribute and may 
have a fexture map, a special surface appearance, 
like glass or metal. An object may be part of an object 
hierarchy and would inherit the attributes of its par- 
ent object and pass these on to its child objects. 
Hierarchies can be used to create jointed figures such 
as robots and animals. 

Finally an object in a virtual world has a location 
and orientation in that world, which is typically 
stored in the world file. Also in the world file is the 
level of ambient (background) lighting and the loca- 
tion of any light sources, their orientation, colour, 
intensity and cone of illumination. The more numer- 
ous the light sources, the more computation is required 
to simulate their effect on objects. Finally there at 
least one viewpoint orcamera described in the world 
file. One viewpoint is always required for a user's 
starting position. But there may be others to show 
useful views of the virtual world. 

The rendering process of a virtual reality pro- 
gram uses the information in the world database to 
produce a view of the virtual world for a user. It may 
also render sounds and other sensations (e.g. touch 
with haptic rendering). The critical factor for the 
image renderer of virtual worlds is theframe genera- 
tion rate. It is necessary to create a new frame (screen 
image) every 1/20th of a second or faster, as this is 
the minimum rate at which the human brain will 
merge a stream of still images and perceive a smooth 
animation. Cinema film achieves 1/24 frames per 
second, television 1/25 frames per second. 

Another problem for the rendering process is that 
virtual worlds have to have depth, i.e. objects must 
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be only be drawn if they are visible and not obscured 
by other objects. Methods such as the painter’s algo- 
rithm or z-buffering are used to produce properly 
drawn depth effects. In the painter’s algorithm, for 
example, objects are drawn by furthest objects first, 
then those progressively nearer the current user view- 


point. Thus near objects will be drawn over far - 


objects. Speed of operation is crucial here. To try and 
maximise speed, objects outside the user's viewing 
cone are clipped, that is not drawn at all. Tasks like 
shading objects to realize lighting effects, or adding 
texture maps to objects for special surfaces, slow 
down the speed of the renderer. 

The input process of a virtual reality program 
controls the devices which allow the user to interact 
with the virtual world. There are a variety of possible 
input devices: keyboard, mouse, trackball and joy- 
Stick. There are special variations on these, e.g. the 
*spaceball', a mouse that operates in three dimen- 
sions. It is a ball mounted on a mouse. You can pull 
and twist the ball in addition to the left/right and 
forward/back motions of a normal mouse. Remem- 
ber, I am only considering desktop virtual reality, 
and so exotic devices like head sets and data gloves 
can be ignored. 

Whatever input device is being used, it must 
provide three coordinates for user position (X, Y, 
and Z, as virtual worlds work in three dimensions) 
and 3 values for user orientation (roll, pitch and yaw, 
which equate to angles of rotation around the X, Y 
and Z axes). 

The heart of a virtual reality program is the 
simulation process. This process feeds information it 
gets from the input process about user actions con- 
tinually to the renderer process, so that the image of 
the virtual world (and any other interaction results 
necessary, like sounds) can be redrawn fast enough 
so that the virtual world appears to the user to be 
responding to them as they would expect. 

Another important function for the simulation 
process is to enforce physical laws in the virtual 
world. Walls in a virtual world are just images and so 
a user can pass through them. Gravity in a virtual 
world is non-existent: objects float in the air at the 
point at which they were released. Scripts are rather 
like macros for the virtual world, in that they cause 
something to happen, an object to rotate or change 
colour etc. Scripts can allow users to pick up and 
move objects. Scripts can also signal collisions, that 
is one object impinging upon the space of another. 
The simulation process is responsible for starting 
and stopping scripts as appropriate in the virtual 
world. 

Finally a virtual reality program operates in one 
of two modes, authoring or playback, and not 
generally both together. 

Authoring mode might involved the use of a text 
editor, to create world and object files for the world 
database. In playback mode the world database is 
read in by the virtual reality program and the three 
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processes go to work to bring the virtual world to 
the screen. 


Desktop virtual reality packages 

For the virtual reality library a desktop virtual reality 
package would be ideal. There is no need for 
immersion of the user (in fact having to don a headset 
would be off putting if one were just wanting to 
browse a library collection). Desktop virtual reality 
packages are also cheaper than immersive ones: 
they do not have to output to esoteric devices like 
head sets, nor try to track body position as move- 
ments the immersed user makes in the real world 
need to translated into similar movements in the 
virtual world, among other things. Desktop virtual 
reality packages tend to run on standard computer 
systems as well. 

The sole UK based supplier of desktop virtual 
reality systems is Dimension International (Zephyr 
One Calleva Park, Aldermaston, Berkshire RG7 
4QZ). They supply a package for building virtual 
worlds called Superscape. Superscape runs only on 
high-end PC compatibles. Superscape has been suc- 
cessfully used in a number of commercial and research 
projects, for example, the Knightmare game on tel- 
evision, and a project for using virtual worlds to help 
problem children learn interaction skill, based at 
Nottingham University. 

Perhaps the marker leader in this area is a package 
called Worldtoolkit, produced by Sense8 Corpora- 
tion (1001 Bridgeway, P.O. Box 477, Sausalito CA 
94965). This runs on 486-based PCs, as well as 
workstations from Sun and Silicon graphics, A Win- 
dows version was recently announced. 

Much cheaper than these two packages, because 
it is free, is a package called REND386. Written by 


_ Virtual reality enthusiasts Dave Stampe and Bernie 


Roehl at the University of Waterloo, Ontario, it is а 
virtual world builder that has modest hardware 
requirements: 386 or 486 PC, 800k of hard disk 
space, 540K of free memory, anda fast video card for 
best results. It is available by using anonymous FTP 
to sunee.uwaterloo.ca (129.97.50.50) and retrieving 
file /pub/rend386/devel5.zip. It is also available at a 
number of other sites worldwide. It also has drivers 
for Sega shutter glasses and Nintendo Powerglove, 
two low cost immersive devices, 

Its authors moderate an email list for REND386 
users. To subscribe send an email message to: 

rend386-request@sunee.uwaterloo.ca 

with a body of ‘subscribe your full name’. Thus 
answers to technical questions and solutions to prob- 
lems can be gleaned either direct from the authors, or 
from the band of keen REND386 users who read the 
list. More recently a book has appeared which ex- 
plains in detail the features and workings of 
REND386*. 

Since my stated aim was to use a cheap, easily 
available virtual reality desktop package, REND386 
appeared to be ideal. 
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Building virtual worlds with REND386 
Although it is free, REND386 is a fully-featured, 
powerful desktop virtual reality package. It uses a 
left-handed coordinate system which means that the 
X axis runs (left to right) horizontally, the Y axis 
(bottom to top) vertically, and the Z axis extends 
away (near to far). When looking at the computer 
screen, one should imagine the X and Y axes running 
along the bottom and left-hand side of the screen 
respectively, and the Z axis running away along the 
monitor housing behind the screen. Each coordinate 
is an integer value, within a range -8,000,000 to 
8,000,000. Each integer unit equals a distance in the 
virtual world of 1 millimetre (although this can be 
altered). 

Building a virtual world consists of creating a 
world file (a text file with а МІР extension) which 
is loaded into REND386 for playback. The world file 
contains a series of statements. A hither statement 
determines how close an object has to be to a user 
before it gets clipped. The default value is 10 units or 
10 millimetres. A yon statement determines the fur- 
thest object/s a user could see, as anything further 
away would get clipped. The default value is 14 
million units or 14 kilometres or approximately 9 
miles. Thus one can see much further in a virtual 
world using REND386 than in the real world!. 

Up to 10 camera locations can be defined (these 
are switched to by pressing a function key) by cam- 
era statements. The first camera location is the start 
location for the user. 

There are three forms of object statement in 
REND386. The simplest is polyobj which in the 
world file defines one polygon in terms of number of 
vertices, surface type, and the X,Y,Z coordinates of 
vertices. The coordinates must be listed in clockwise 
order, in order for the polygon to be drawn correctly. 
Another polygon must be defined using the same 
coordinates, but given in anti-clockwise order, other- 
wise the first polygon will be opaque from one side 
only. This can be a useful feature of REND386 in 
that it is possible to create ‘one-way walls’ which are 
invisible from one side. There is also a polyobj2 
statement which builds a polygon opaque on both 
sides and it needs two surface types instead of one. 

Complex objects are made up of many polygons 
and definitions for these are. stored in object files 
(text files with a .PLG extension). These are referred 
to in the world file by object statements which essen- 
tially give the file name of the object file. REND386 
can load these object files and show the objects they 
' contain in a virtual world. Objects can be scaled, 
rotated and translated on loading, so that one object 
can be loaded many times into one virtual world (e.g. 
one ‘tree’ object can be repeatedly used, slightly 
differently scaled and positioned, to build a forest). 

Each polygon of each object gets a surface 
specifier in the object statement, a four-digit hexa- 
decimal code, which denotes type of surface 
(fixed-colour, shaded, metallic or transparent) and 


154 


either colour and brightness or a special index value 
for the last three types of surface. 

‚А parent object can also be declared in the object 
statement, to which the current object willbe attached. 
An object will perform any action that its parent 
object performs (so that complex figures, which can 


move together, can be constructed out "ot object 


parent-child hierarchies). 

Finally a surface map specifier can be declared in 
an object statement. This is essentially a named 
surface type (eg cement, brick) which has already 
been defined by a surface specifier in a surfacedef 
statement. In this way surfaces can be referred to by 
name, making them easier to remember and use, and 
the same surface can be used by a number of different 
polygons in an object. 

The renderer process in REND386 uses the paint- 
er’s algorithm for giving depth to virtual worlds, but 
with the facility of definingsplitting planes to resolve 
any drawing problems related to depth. Splitting 
planes are used to divide the world into smaller areas 
by forming invisible walls. Objects on the far side of 
a splitting plane are drawn first, then objects on the 
splitting plane and finally objects on the near side of 
the splitting plane. The ordering of objects caused by 
the placement of the splitting plane overrides the 
normal ordering of objects that the painter’s algo- 
rithm would use. It is the task of the designer of.a 
virtual world to add splitting planes to resolve draw- 
ing problems caused by following the painter’s 
algorithm. Splits must be defined in a hierarchy. The 
first split divides the world into two areas, the second 
split divides one of those areas into two areas, etc. 
These areas can be named in REND386. 

The input process in REND386 can use the key- 
board or a standard mouse. The cursor keys can be 
enhanced by using one of the shift keys to give 
movement abilities in three dimensions. The same 
applies to a mouse, but its buttons are used instead of 
the shift keys. Since the mouse can also. be used to 
select objects, a special key (J) switches it between 
movement and selection modes. 

The simulation process recognizes mouse clicks 
as the way for a user to select and interact with 
objects. When selected an object is highlighted, by 
being given white edges. The object can be moved by 
the user, or re-coloured or re-surfaced. The object 
can be attached to the user (or the user can attach to 
an object) so that both move together. Selected ob- 
jects can be saved to disk, while new objects can be 
loaded into the virtual world. 

There are many more features and commands/ 
statements in REND386 than have been covered 
here. The interested reader is referred to the excellent 
book by the creators of REND386, Dave Stampe and 
Bernie ВоеБЁ, 


Browsing the virtual reality library 


So now to implementation. Since there are no design 
guides available yet for building virtual reality 
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libraries, I will use the best evocation of one I have 
read, which is not in the literature of information and 
library studies at all but in fiction. Jorges Luis Borges, 
director of the Argentine National Library and noted 
fabulist, describes an infinite library, made of 
interconnected hexagonal galleries, in the story 
The Library of BabeP. The hexagonal galleries inter- 
lock, with passages to connect them. Each gallery 
follows a vertical stairwell, which has no beginning 
and no end. 

My virtual reality library alas cannot be infinite. 
The world file for it (LIBBABEL.WLD) specifies 4 
floors, one each for authors, titles, subjects and clas- 
sified arrangements. Each floor is a named area 
(‘authors’, titles’ etc. as appropriate) im REND386 so 
that one can identify on which floor one is browsing. 
Under the library is an area known as ‘basement’: 
over it is one known as ‘roof’. 

Another clue as to whereabouts is the colour 
scheme I have adopted for carpets and walls. Each 
floor has its own colour scheme (which hopefully is 
easy on the eye). There is a central well area (stairs 
are not needed) in which one can move up or down 
between the floors. If one is adventurous, one can 
also move up and down through floors and ceilings. 

On each floor are four named ‘finding areas’. 
Each one is named for that part of the sequence of 
items it contains (e.g. ‘authors A-E’, etc). Off each 
finding area are the items themselves, object repre- 
sentations of the document collection. Each item is 
an object, coloured according to publisher. The item 
details appear as the name for the area that object 
occupies, So one actually has to go inside an item to 
find out what it is. 

Another aid to navigation are eight camera view- 
points, two on each floor, which can be switched to 
by a function key press. These are the fastest way to 
move around. There are also two external camera 
viewpoints as well. One can switch to these and 
browse as the library has walls which are transparent 
from the outside and so one can browse outside the 
library looking in. 

When one has found an interesting item one can 
mark its location by use of a ‘placemark’ object, a 
black cube. This is a special object file which can be 
loaded into the library when needed. - 

This virtual reality library is only a prototype. It 
lacks some of the features I describe in my previous 
paper’, REND386, although powerful, lacks some 
facilities. The chief on of these is lack of text display 
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for each object. The area is the only element in a 
REND386 virtual world whose name can be dis- 


: played. The other missing element is the ability to 


choose relevant object (by clicking on them with the 
mouse) and seeing that object, in all its locations in 
the virtual reality library, being highlighted. The 
idea is then to browse around this highlighted object 
in all its locations, as its neighbouring items may also 
be relevant. 

I have tried to show what can be done by using 
the intrinsic facilities of REND386. All is not lost 
however when these are not sufficient. REND386 is 
also available as a set of C/C++ programming librar- 
ies. The next step is to add the facility of displaying 
object names as text,.and the highlighting of an 
object in all its locations. Once this has been done, 
the concept of the virtual reality library will have 
finally achieved reality. 


| REFERENCES . : 

1. POULTER, A. Towards a virtual reality library. 
Aslib Proceedings. 1993, 45(1), pp.11-17. 

2. OPPENHEIM, C. Virtual reality and the virtual 
library. Information Services and Use. 1993, 13(3), 
pp.215-227. 

3. SPRING, М.В. Informating with reality. In: 
HELSEL, S. and ROTH, J., eds. Virtual reality: 
theory, practice and promise. Meckler, 1991. pp.3-17. 
4. SEILER, L.H and SURPRENANT, T.T. The 
virtual reality information center. Jn: Computers in 


. Libraries International 92: proceedings of the sixth 


annual conference on Computers in Libraries held 
in London in February 1992. Meckler, 1992. pp.119- 
122. 

5. NEWBY, G.B. Virtual reality. Annual Review of 
Information Science and Technology. 1993, 28, 


` pp.187-229. 


6. AUKSTAKALNIS, S. & BLATNER, D. Silicon 
mirage: the art and science of virtual reality. Peach 
Pit Press, 1992 

7. SUTHERLAND, I. The ultimate display. In: 
KALENICH, W.A., ed.Information Processing 1965: 
proceedings of the IFIP Congress. Spartan Books, 
1965. 

8. STAMPE, D. & ROEHL, B. Virtual reality crea- 
tions. Peach PitPress, 1993, 

9. BORGES, J. L. Labyrinths: selected stories and 
other writings. Harmondsworth: Penguin Books, 1970. 


“INFORMATION MANAGEMENT 
-FOR BUSINESS 


NFORMATION |5 A CRITICAL RESOURCE 
WHICH IF USED EFFECTIVELY CAN LEAD TO 


COMPETITIVE ADVANTAGE. WITH THE USE OF 


IT HAVING OUTSTRIPPED THE MANAGEMENT 
OF INFORMATION, THEREBY CLOUDING THE 
ABILITY TO DEVELOP EFFECTIVE INFORMATION 
SYSTEMS, THIS IS ESSENTIAL READING. 
IT SHOWS HOW MANAGEMENT IS RECOG- 
NIZING THE NEED FOR MORE SUPPORT IN 
THE PROVISION OF INFORMATION TO AID 
COMPLEX DECISION-MAKING IN THE MODERN 
BUSINESS ENVIRONMENT. A MUST BUY 
FOR ALL BUSINESSES AND FOR THOSE 


STUDYING THE IMPACT OF INFORMATION 


MANAGEMENT IN THE COMMERCIAL WORLD. 


ALLAN TAYLOR and 
STEPHEN FARRELL 


Contents: 

@ Organization and management 

@ Theories of human motivation 

@ Management styles 

@ Functions within management 

@ Data and information in business organizations 

@ Management information systems 

€ Information гедиігетептѕ analysis 

@ Role of the hybrid information manager 

@ Information as a resource for competitive advantage 
@ Mapping information elements 

@ Information economics (value-added) 

@ Developing information strategies. 

@ Information management and organizational change 
@ Information management in context 

€ Social intelligence and corporate cooperation 

e Summary 

Information management for business 


236х154тт; 230pp approx; 1994 0 85142 313 2 paperback 
UK and Europe: £28 (£22) Rest of the World: £35 (£28) 


For orders contoct: 


N House, 20-24 Oro Smeer, Lonoon ECIV 9AP 
68 Fax: +(44) 71 430 0514 E-man: aslib@aslib.demon.co.uk 


2 2 МОМУ AMAILABLE : 


Applications of optical media 


Edited by Charles Oppenheim 


A timely update on the applications of recent optical 
media, It covers costs, selection, implementation and 
management implications. 


The expert contributors, including Caroline Moore, 
Joan Day, W Blinder, Dr KW Newbauer, Michelle Green, 
Bernard Von Ommerslaghe, Robin Williams, Cherril Smith, 
and William Beckett, give a perspective to the technologies 
that will have an impact on the optical publishing field 
for years to come. 

Ashb 


Corporate 
Members - 


| 20% 


DISCOUNT 


230 x 154mm; 1993; х, 13ёрр 
0 85142 290 X hardback 

UK and Europe: £32 

Rest of the World: £38 


For orders contact: 


CONTENTS: 


* 
* 
* 


* 
* 
* 
* 
* 


Introduction to optical disc technology 
The market for CD-ROM 


The use made by information managers of 
CD-ROM products 


Networking of CD-ROM 

CD-ROM Help Desk 

The uses of CD-ROMs in business environments 
Celex on CD-ROM 

Multimedia — the personal growth machine 


THE ASSOCIATION FOR INFORMATION MANAGEMENT 


Inroxmation House, 20-24 Orn Sracer, LoNpoN ЕСІМ 9AP 
Temone: +(44) 71 253 4488 Fax: (44) 71 430 0514 Е-мли: aslib@aslib.demor.co.uk 


402H0393 





The role of GIS in the management of natural resources 





The role of GIS in the management of natural resources 


G C Deane 


Hunting Technical Services Limited, Thamesfield House, Boundary Way, Hemel Hempstead, Herts HP2 7SR 
Paper presented at the Aslib Electronics Group Annual Conference held at Danbury Park Management Centre, 


Chelmsford, 12-14 May 1994. 


Most aspects of land resources management require 
information on the current extent of features and the 
ways in which their distribution has changed in the 
past. Such information can be collected by ground 
survey and/or the use of aerial survey and satellite 
imagery. By capturing these spatial data on a 
computer-based geographical information system and 
overlaying different data sets the land resource planner 
and manager have the capability to analyse changes 
in the distribution of features. This is important for 
assessing the impact of previous planning decisions 
and for carrying out inventories of existing 
procedures. 

By choosing appropriate modelling criteria, the 
possible future outcome of different planning deci- 
sions relating to resource use and conservation can 
be predicted using the analytical capabilities of the 
GIS. This paper examines a number of typical appli- 
cations of GIS technology for resources inventory 
and management. These include the use of GIS for 
the establishment of a resources database and the 
digital preparation of maps in Yemen; the analysis of 
changing patterns of forestry and the assessment of 
site suitability for teak planting in Tanzania; crop 
inventory in Bangladesh and Spain; and environ- 
mental sensitivity mapping in the Black Sea. 

In addition to information about physical re- 
sources, a management information system must be 


able to handle socio-economic and other supporting 
data, so that the sustainable utilization, conservation 
or protection of physical resources:can be effectively 
planned. The use of computer based systems to draw 
together these data enables a number of ‘what if...’ 
analyses to be carried out so that optimized resource 
management plans can be developed. 


1) Introduction 

A Geographical Information System (GIS) is a 'sys- 
tem for capturing, storing, checking, integrating, 
manipulating, analysing and displaying data which 
are spatially referenced to the Earth. This is normally 
considered to involve a spatially referenced compu- 
ter database and appropriate applications software’, 
This definition, included in the Chorley report on 
Handling Geographic Information (DoE, 1987), sum- 
marises the wide ranging nature of a GIS. However, 
many of the tasks performed in a GIS are already 
familiar to resource managers; they are familiar with 
integrating and analysing a variety of data, for exam- 
ple in the assessment of land suitability in the land 
use planning process. The linkage of data within a 
GIS is shown in Figure 1. 

Before the advent of computer-based GIS the 
integration of spatial datasets could be a laborious 
and time-consuming activity. It involved transform- 
ing the datasets to a common map scale, creating a 


Figure 1: Linkage of data within a GIS 
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transparent overlay for each dataset, registering these 
overlays so that their coordinate systems were aligned 
and then manually creating a composite overlay 
sheet. A GIS makes data integration a much faster 
and more straightforward task. 

GIS techniques are increasingly being used to 
integrate and analyse datasets for applications rang- 
ing from emergency planning and the management 
of urban areas to land use planning as well as re- 
sources management. Organizations that use GIS 
include national and local governments, the utilities 





industries (gas, electricity and water), retail stores, 
banking and finance institutions, environmental agen- 
cies, water authorities, topographic mapping agencies 
and educational establishments. In all cases the as- 
semblage of a wide range of disparate datasets brings 
new perspectives to the planning process. 

An increasing number of land resource.planning 
and management decisions are based on the results 
of GIS analysis. Once the relevant datasets have been 
converted to digital format, the GIS may be used to 
explore spatial relationships between them and to 


Figure 2: Integrated Remote Sensing and GIS 


A GIS is built up from layers of information, enabling analysis of resource distribution and planning decisions. 


Image acquisition 
7 All types of digital satellite imagery or scan digitized aerial 
A photography can be input to the system. 


Image.Processing: Geometric Correction 

Geometric correction of imagery is carried out using known reference 
points. Where these are not available data from Global Positioning 
Satellites can be used to assist geometric correction. 


Image Processing: Enhancement 

Imagery can be enhanced; the type of enhancement being strongly 
dependent on the proposed application, ranging from a simple contrast 
stretch to improve image appearance to other more complex procedures 
such as edge enhancement and principal components analysis. 


Classification 

Automatic classification of the imagery can be performed utilizing 
appropriate software. This information on land cover provides an 
important layer of up-to-date information for the GIS. 


Manual Interpretation 
A manual interpretation can be made from hardcopy by staff 
experienced in remote sensing and land resources. 


Digitizing Overlays 
Overlays produced by interpreters or automatic classification can be 
digitized or scanned for input to the GIS. 


Data Analysis 
Different datasets can be combined and analysed within the GIS. 


Digital Mapping and Map Production | 

The digital output of data analysis can be formatted to fit a standard 
mapsheet layout or project specific layouts which can be designed 
within the GIS. | 
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examine the outcome of an infinite number of ‘what 
if...” scenarios. These analyses are based on data 
derived from a variety of sources, such as aerial 
photography, satellite imagery, ground survey and 
existing paper maps. Once these have been trans- 
formed to digital data and referenced to the same 
coordinate system they may be combined within the 
GIS. The ability of a GIS to perform rapid and 
efficient spatial modelling enables its successful ap- 
plication for land evaluation and suitability 
assessment as well as management of the utilization 
and conservation of natural resources. A schematic 
diagram of data inputs to a GIS is shown in Figure 2. 

The aim of this paper is to illustrate some examples 
of the use of GIS for natural resources management. 
This will include discussion of both the benefits and 
drawbacks of using this technology. In addition com- 
ments will be made regarding the requirements and 
considerations necessary for the establishment of an 
operational GIS facility. 


2) Case Studies 

The widespread use of GIS has developed during the 
last ten or so years. The improvements in both gen- 
eral hardware technology, specifically increased 
computing power or less cost, and the development 
of applications software enable not only major na- 
tional agencies but also individual projects to share 
in this technology. Outlined in this section are some 
examples of where GIS has been of assistance to 
resource managers at the stage of inventory/resources 
assessment; planning for use and conservation of 
resources; and continuous management of the re- 
source. 


i) Woodland resources mapping in Yemen 

This study, which has recently been completed, was 
designed to produce a comprehensive database of the 
woodland resources of the whole country. The input 
data was a combination of interpretations from satel- 
lite images and aerial photographs supported by field 
data. This information was then digitized using a PC- 
computer compatible software package. In this case 
the software was Arc/Info, but the important point is 
that digitizing a large database can be satisfactorily 
achieved using a PC based system. 

Once the data had been captured, the digital 
database was used to prepare final maps at 1:50 000 
scale (153 sheets in all) using digital cartographic 
techniques. Additionally area estimates of the differ- 
ent woodland categories were calculated by 
administrative area and eco-climatic zone; the 
boundaries of these were derived from other layers in 
the database. This information could also be coupled 
with socio-economic data, such as population den- 
sity, in order to derive areas with a surplus and deficit 
of potential fuelwood supplies. 

Forestry is a fledgling activity in Yemen, prima- 
rily because of the generally low density of tree 
cover. However, this project will provide some of the 
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basic data to enable the detailed planning for future 
use of the resource. It will also form the basis of 
planning detailed inventory of the resource in the 
future, so the accurate estimates of the biomass can 
be derived. 


й) Forest change monitoring in Tanzania 

The aims of this study were to examine in some detail 
the changing patterns of forest cover in an area of 
Tanzania and to identify areas suitable for replanting 
with teak. Again the input data have been prepared 
from aerial photographs, to give the distribution of 
forest in the 1950s, and high resolution satellite data 
to give the current distribution. The two maps were 
overlain and the changes in extent of various features 
was quantified; it was possible to measure the loss of 
mature forest and the gains in agricultural areas. This 
exercise was completed by digitizing the forest cover 
maps in a vector-based GIS and then analysing the 
changes in a raster system. This approach enabled 
the overlay procedure to be completed much more 
quickly than using the vector approach, although the 
same results could have been achieved by either 
method. The requirement for overlay modelling can 
determine the structure of the GIS software obtained 
for a particular project. 

Once the basic data had been compiled, digitized 
and analysed, a map showing the forest units and 
areas of change was prepared. This was, however, 
only the first stage of the process. Using a three- 
dimensional surface modelling package, a digitized 
contour map can be transformed into a slope map. 
Using this as a base and knowing that suitable sites 
for teak must be flat or nearly flat, it is possible to 
determine those areas that are most suitable for re- 
planting. If other features, such as proximity to rivers 
(teak should not be planted in land liable to 
waterlogging) and proximity to roads (suitable for 
timber extraction) are included in the model, then 
only those sites satisfying all the relevant criteria can 
be flagged. At this stage the forest manager has a 
document for detailed planning of logging and re- 
planting operations. 

One further stage is to link the GIS spatial data 
output to a more specific software package capable 
of assisting the forester in detailed scheduling of 
operations, containing details of separate compart- 
ments and providing schedules for felling and other 
operations. This provides the link between a GIS and 
a management information system. 

A consideration here is the link between a na- 
tional system of data storage and the local 
requirements. In a fully integrated GIS at ministry 
level, there would be a national centre holding copies 
of all data, with regional and project offices only 
holding those data that are applicable to their own 
requirements. 

The establishment of such a system would re- 
quire careful planning in order to determine the data 
format, system networking, data transfer and quality 
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control procedures. This requires the setting up of a 
steering committee whose job it would be to define 
the national standards prior to implementation of a 
GIS. 


iii) Environmental sensitivity mapping 

This project involved the examination of existing 
datasets covering the Black Sea in eastern Europe. It 
illustrates how the use of GIS can help to amalga- 
mate different datasets so that although the different 
data are already in existence, and may even be very 
well documented, it is only when they are all inte- 
grated with a single spatial reference system that 
their true worth can be identified. 

In this exercise, a range of data relating to the 
occurrence of natural resources and populations of 
wildlife and the activities of man were combined. At 
this stage, following GIS overlay analysis, it was 
possible to sub-divide the area of interest into re- 
gions where the risks from development would 
potentially be the greatest. All the data used in this 
exercise were in public domain; the GIS has pro- 
vided the medium by which these data can be made 
useful for the purposes of environmental monitoring. 

The completion of such a study is rapid because 
no new data are collected; just existing information 
is compiled in a spatially referenced way. Very 
quickly the resources manager has information that 
will enable him to locate the areas that could be 
exploited and the environmentalist or the regulatory 
body has the information that would allow them to 
identify those areas at risk if development goes ahead. 
The compilation of such information is often an 
essential stage in the execution of an environmental 
impact study. 


iv) Agricultural inventory ' 

In a number of areas the use of satellite remote 
sensing to provide information on resources has 
reached the operational stage. For example, in Eu- 
rope there has been a major study involving the use 
of remote sensing for monitoring agricultural crop 
areas and yields; this project began in 1988 and 
several countries are now adopting the methodology 
for annual crop surveys. This project, known as 
‘MARS - Monitoring Agriculture with Remote 
Sensing’, is being undertaken by the Joint Research 
Centre of the European Community in conjunction 
with the Community's Agriculture Directorate. 

The interpretation of satellite imagery is, how- 
ever, only one part of the exercise; the use of GIS in 
conjunction with satellite image analysis is an im- 
portant key to the success of the activity. The are 
several important roles for GIS in this project. These 
include the integration of field data with the satellite 
image interpretations; the integration of other datasets, 
such as meteorological data; and, the overlay of 
administrative boundaries and agro-ecological units 
in order to provide statistics for particular areas of 
interest. 
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The use of GIS can also help to improve the 
quality of the results. For example, in Spain it was 
possible to examine in detail the ground survey 
results in order to identify sample areas that are 
substantially different from their neighbours. From 
this analysis it was possible to identify areas where 
the cropping patterns are consistently different, 
thereby improving the accuracy of stratification of 
the area prior to calculating the statistical results. In 
another instance the identification of an area with 
abnormal results indicated a region where the crops 
had failed due to bad weather. 

The integrated use of remote sensing and GIS 
technology can be ofuse ina wide variety of resource 
inventories. The MARS methodology is also being 
adopted outside Europe, in areas where accurate and 
up-to-date agricultural statistics are required. Exam- 
ples include studies in China, Bangladesh and North 
Africa. 


3) Implementing a GIS 

The preceding illustrations of GIS applications show 
a variety of ways in which the technology can be of 
use to resource managers. However, the establish- 
ment of a GIS for use at either project level or for 
major organizations requires careful consideration 
of the options that are available. Such consideration 
involves not only an examination of the hardware 
and software options that exist but also an assess- 
ment of the data output requirements and the cost of 
acquiring reliable data for input to the system. Data 
acquisition, input and maintenance are likely to be 
more significant in the longer term than the cost of 
purchasing the GIS. 

Recent advances in computing performance that 
can be achieved from PCs and workstations enables 
GIS to be implemented at relatively modest cost. It is 
no longer necessary for large computers with special 
installation requirements and high maintenance costs 
to be at the heart of a GIS. A network based around 
PCs for data entry and data analysis (at the project 
level) integrated with workstations and central data- 
base servers for major national and regional facilities 
is now sufficient for most resource management 
applications. 

The performance of fourth-generation PCs is 
sufficient for input of data via digitizers, modelling 
and analysis as well as data output via plotters; the 
Yemen project described above was completed in 
this environment. If the data generated by this 
particular project are to be made available to a range 
of users, however, it would be necessary to consider 
establishing a central facility based around work- 
stations. The needs of the regional forest manager or 
project supervisor would be satisfied by PC technol- 
ogy. 

The choice of software is becoming ever wider, 
but in some cases less critical. Particular software 
packages have advantages for the economy of data 
storage and the ease of overlay analysis; others have 
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advantages in their map preparation capabilities; and 
others are especially suited to integration with satel- 
lite data and digitized aerial photographs. Some GIS 
packages are easy to learn, having been designed 
initially as a training package for introducing the 
technology and then developed into an operational 
system. 

The advantage of continuing development of GIS 
is that conversion of expensively digitized datasets 
between the file formats of different manufacturers 
products is now much easier. It is feasible to consider 
a system comprising two or more software packages, 
each dedicated to a particular function. The map 
production department may well, therefore, run a 
different system to the resource planner; the first will 
want a high quality map output whilst the second will 
want fast and efficient modelling capability, and the 
answer to ‘what if...’ questions. Both systems will 
draw data from the central database facility. Other 
recent developments mean that the graphics capa- 
bilities from GIS can be integrated with non-GIS 
databases and the two are only joined when the 
spatial dimension provides a better understanding of 
the information held in these other databases. 

After the technology solution has been addressed 
careful consideration of the cost/benefits of includ- 
ing various datasets is necessary. Too often every 
available piece of information has been included, at 
a high cost of data input, when in fact there is likely 
to be little benefit of using some of the information 
that has been collected. Avoidance of this problem is 
the responsibility ofthe resource manager who has to 
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be fully aware of what information is needed to make 
the planning decisions; if a factor is not particularly 
important for plant growth in a given area, inclusion 
within a GIS will not increase its importance. 

If the resource manager is to utilize the system 
fully he has to be confident that the data contained 
within it are accurate and reliable. This requires 
comprehensive quality control procedures for data 
entry, including checking the source as well as the 
actual digitizing process. In addition there must be 
routines in place for regular update of information 
that has a time dimension. If these procedures are not 
in place the analyses will be flawed and the manager 
will not be able to gain the advantages that using a 
GIS brings. 


4) Conclusion 

A number of practical applications of GIS technol- 
ogy have been presented. They illustrate different 
ways in which the analysis of spatial data can pro- 
vide the resource manager with information needed 
to carry out the functions of planning and manage- 
ment. This information is not necessarily ‘new’; it is 
merely presented in a more efficient and understand- 
able way. Complex datasets can be simplified and 
time-series of data can be analysed in ways that 
would be almost impossible without computer tech- 
nology and the consequences of planning decisions 
can be modelled in the laboratory before irretriev- 
able mistakes are made. A practical and realistic 
approach to GIS makes it a very important tool in the 
resource manager's hands. 
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Abstract 


This paper proposes a number of essential requirements intended to provide direction for translation work- 
benches of the future. The points made arise from a consideration of the problems and frustrations encountered 
during several years’ experience in the use of proprietary and in-house translation tools. The paper will also 
suggest innovations which may considerably improve the productivity and flexibility of future translation 


workbenches. 


Introduction 

Itis only by keeping translation software attractive to 
the increasingly demanding business world that com- 
puterized translation will move forward. Approaching 
the issue from the point of view of industry, this 
` paper will attempt to summarize, albeit briefly, a few 
essential aspects of translation workbench design 
which I feel provide important directions for the 
future. 

The points made are based on several years' 
experience in the use of proprietary and in-house 
translation tools at Rank Xerox. The specific recom- 
mendations in this paper arose from a detailed, 
in-house study of requirements for future translation 
workbenches conducted in 1992. 

Ihave chosen only a small number of key require- 
ments for this paper, and have grouped them under 
four topic areas: openness, flexibility, productivity 
and information management. 


Why are we using translation tools? 

Itis worth briefly reminding ourselves, first, why we 
use computers for translation in industry, and what 
we use them for. The hope is that computerization 
will eliminate slow or complex manual tasks in order 
to reduce translation costs and time to market. At the 
same time, translation software should effectively 
maintain or improve quality of translation so as to 
increase customer satisfaction. 

An important point, to which we will return later, 
is that, in order to achieve faster time to market, 
companies are increasingly looking at ways of per- 
forming most of the translation on early versions of 
the text, and then translating small incremental up- 
dates nearer the product launch date. Translation 
tools have an important role to play here in reducing 
the bulk of update translations. 


Definitions 

The term ‘source matching’ is used in this paper to 
cover what is often referred to elsewhere as ‘transla- 
tion memory’ or ‘repetitions processing’. A foreign 
translation retrieved by matching a string of source 
text during source matching is referred to as a ‘re- 
call’, The term ‘change analysis’ refers to a suggested 
application of ‘text alignment’ — this will be ex- 
plained in detail below. 


Overview of the translation process 

Let me set the scene for what I will say later by 
drawing a simple model of the translation process. 
Figure 1 shows, very schematically, the basic elements 
of most translation processes. At this level, these 
stages are largely the same for translation of both 
documents and software-based text. 

Note that translation software is required to assist 
in all the activities outlined in Figure 1. The transla- 
tion workbench is far more than just an editing 
environment. It is needed for administrative activi- 
ties long before and after the translator actually sits 
down to edit text. 


Project setup involves the creation of directory struc- 
tures and the completion of other necessary 
administrative activities prior to the importing of 
data. 


‘Dictionary building normally takes place in 


advance of translation, and involves running soft- 
ware which can help identify unknown words or 
phrases in a corpus, obtaining validated translations 
for.those terms, and preparing dictionaries for access 
during translation. 


During the import stage, the data is transferred 
from the outside world into the translation environ- 
ment. At this point the data may be converted and 
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tested for conformance to agreed formats. Segmenta- 
tion of data into manageable chunks may take place 
at this stage or during data preparation. ` 

During data preparation the data is prepared for 
release to the translator. Processing at this point may 
include such things as source matching', machine 
translation and change analysis’, as well as the prepa- 
ration of reference information such as dictionary 
items and notes from designers. 

Once the material has been prepared, it is handed 
over to the translator, who will edit or supply the 
translation. 

During translation, the translator will usually 
need to ask questions about the meaning of the 
source text, ask about appropriate translations for 
specific items in the source, request more expansion 
space, etc. These queries need to be dealt with as 
quickly and efficiently as possible. This is the func- 
tion of the query management subsystem. 


- The export stage removes the data from the 
translation environment and puts.it into the form 
required for return to the customer. 

Material must be proofread and then validated 
on behalf of the customer before being prepared for 
publication or loaded into the software. 

During the housekeeping stage, backups, 
archives and other administrative activities take place. 
The main dictionary database also needs to be up- 
dated in the light of changes or additions made 
during translation or validation. 

Let us now consider some essential requirements 
for developing future translation software. 


OPENNESS 

Aim for a generic translation environment 

Any large translation department has to deal with 
translation text from a wide and continually growing 
set of product environments. Each of these product 


Figure 1. Translation process overview 
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environments organizes and stores its data in differ- 

ent ways. 

If tools are written specifically to translate prod- 
ucts from a particular product environment, it is 
usually difficult, sometimes impossible, to adapt 
them successfully for use later with completely new 
product environments with which the translation 
department may have to deal. 

Developing one-off solutions for the translation 
of differing environments, having to adapt tools 
significantly each time, or using a number of differ- 
ent translation tools can soak up a great deal of time, 
money and resources: 

1. New processes and tools must be developed and 
tested each time. This can place a serious strain on 
resources and schedules. 

2. Users of the translation tools have to switch 
between new environments constantly. This creates 
a very high retraining cost, and also affects the 
productivity of the user, who can become con- 
fused by the differences between each process and 
environment. 

3. The processes and tools used for translation have a 
short life and never reach a high level of stability 
and robustness. This creates significant costs in 
troubleshooting, downtime, support and rework - 
over and over again. 


Here are some suggested solutions to these 
problems: 

1. Design a single translation system capable of hand 
ling translation of software text from any product, 
and documentation from any publishing system. 
This should considerably reduce development, test- 
ing, support and retraining costs (both financial 
and resource) and thus considerably reduce the 
basic translation costs. In addition, and signifi- 
cantly, since the same tools will be reused for all 
products, the tools will become stable, and that 
stability will bring further dividends in terms of 
reduced troubleshooting and support requirements. 

2. Specify standardized data interfaces or filters for 
documents, extracted software text and simula- 
tors. This is a key enabler for the previous point. 
This means that data are always passed between 
product team and translation in a format which the 
translation tools understand. Also, standard inter- 
faces to simulators (see below) need to be 
developed and implemented, so that data and com- 
mands can be passed between them and the 
translation environment. 

3. Standardize and integrate, wherever possible, in- 
terfaces and functionality provided for translation 
of documentation- and software-based text. Soft- 
ware and documentation translation can differ in a 
number of (sometimes significant) ways. Never- 
theless, wherever common functionality can be 
found it should be used for both. In particular, 
such things as editor functionality should be stand- 
ardized. (Indeed, it may be helpful to make 
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translation editor functionality mimic other edit- 
ing environments used extensively for other 
activities in the office, such as for writing mailnotes, 
reports, etc., since this will reduce the interference 
factor for the translator. This would mean allow- 
ing for a choice of look and feel for the translation 
editor.) 

4. Consider the selection of a standard publishing 
environment in which to publish translations, and 
develop documentation in this format or develop 
programs to convert other publishing systems into 
this format. (After conversion, checking will need 
to take place in order to ascertain that all features 
in the native environment have converted cor- 
rectly, and to make changes if necessary. At this 
point, it would be possible to test for translatabil- 
ity issues and appropriate changes to enable 
translation (e.g. increase line height for Japanese, 
or base line height for Arabic). This could be an 
effective way of reducing off-standard costs — bear 
in mind that off-standard costs are otherwise mul- 
tiplied by the number of languages in which 
translators experience difficulties!) 


Design truly multinational software 
Current ALPS and Systran systems as used by Xerox 
are unable to cope with requirements to translate into 
languages beyond the basic Western European set. 
At CBS, we already translate into East European 
languages, including non-Latin character sets such 
as Russian and Greek. Future requirements include 
Middle Eastern and Far Eastern languages, where 
character shape and text direction are far more com- 
plicated than those found in Western European 
languages. Indeed, there seems to be a significant, 
general growth in the importance of non-European 
markets for the software industry. 

While designing translation environments, care 
must be taken to avoid locking the user into a re- 
stricted set of languages. This means allowing for 
factors such as the following: 

1. enabling appropriate character code storage for 
languages such as Japanese, Chinese and Korean, 
or for documents containing a mixture of lan- 
guages, for which 8-bit character sets were never 
really suitable 

2. providing modules to render non-Latin characters 
appropriately if they are context sensitive (e.g. 
Arabic) or make extensive use of ligatures (e.g. 
Arabic- and Indian-based and South East Asian 
languages) 

3. enabling more complex placement of diacritics or 
vowel signs than that found in European lan- 
guages 

4. enabling bidirectional text input and editing for 
languages such as Arabic and Hebrew 

5. enabling intelligent algorithms to improve the ef- 
ficiency of input for languages such as Japanese, 
Korean and Chinese. 
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Other factors also need to be borne in mind. For 
example, algorithms which parse segments for dic- 
tionary lookup must not be hindered by the fact that 
languages such as Japanese and Thai do not separate 
words with spaces, as we do in English. 


FLEXIBILITY > 

Create user-defined systems 

The translation tools will need to be flexible enough 
to allow a slightly different approach where required, 
be it for technical or organizational reasons or simply 
user preference. 

It should be possible to configure the translation 
workbench to meet the needs of the organization’s 
ideal work process. The workbench should not dic- 
tate or restrict the choice of work process. 

As much information as possible about how the 
system works should be user-defined, rather than 
hard-coded. It should be possible to modify the us- 
er’s or organization’s preferences by the use of 
definition files or property sheets which are easy to 
understand and modify. For example, users should 
be able to tell the system how to segment text, sift 
non-translatables, assign statuses, present items in 
the reference area, skip through the source text, etc. 
etc. The translation tools should include a workflow 
management system which is easily adapted for dif- 
ferent types of organization. 


Allow flexibility between batch and interactive 
translation 
Machine translation is best suited for simple, 
sublanguage texts where information is explicit. For 
such texts, machine translation can give important 
productivity gains. As your source text moves from 
simple text, such as parts lists or service manuals, to 
such things as training manuals or marketing bro- 
chures, where stylistic variation, reliance on context 
and language complexity become relatively more 
important, the productivity of machine translation is 
rapidly eroded by the increasing need to post-edit. 
A large number of companies deal with both 
simple and complex texts. For this reason an ideal 
translation workbench should allow an organization 
to translate using machine translation and/or interac- 
tive translation, wherever each is most appropriate. 
This may not mean designing your own machine 
translation system as part of your workbench. You 
may be able simply to provide filters and links to 
access an existing system. 


Enable decentralized translation 

Especially where new markets are opening quickly, 
translation departments are likely to need to do the 
translation itself in remote locations, while adminis- 
tering and processing the data from a central location. 
The translation system of the future must enable this 
interaction between the remote translator and the 
central hub. 
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The platform on which the editing software is 
based should be easily obtainable, cheap and port- 
able. It must also be robust, and allow for simple 
installation and remote troubleshooting, while pro- 
viding as much useful functionality as possible to the 
translator. 


Allow for future enhancements 

Software must be written in such a way that future 
enhancements can be easily added without major 
rework. The aim is to avoid having to develop or find 
a new translation environment again — just simply 
refine what we have. 


PRODUCTIVITY 

Enable fast and simple administration 

With a number of existing systems there can be high 
costs in terms of time and resource for administrative 
activities such as importing and exporting of files, 
data preparation, dictionary management and dic- 
tionary building. In fact, this is a key problem for the 
use of computerized translation tools in general. 
Only companies which can afford to carry these 
overheads can invest in the translation tools which 
will bring them greater productivity. 

Additionally, there is a cutoff point in the size of 
translation jobs, below which it is deemed more cost- 
effective to translate manually because of the 
administration costs. (Mznual translation of a small 
update to a product can then become problematic 
because all the information needed for source match- 
ing and change analysis at the time of the next 
revision is lost. Manual translation also makes it 
harder to maintain translation quality in terms of 
consistency and validated terminology.) 

Translators’ workbenches need to reduce admin- 
istrative activities to the minimum in all aspects of 
the translation work. It must be possible to import, 
prepare, export and manage jobs quickly enough that 
the size of the job makes no difference. One way to 
achieve this is to reduce as far as possible the number 
of points at which the user has to intervene in order to 
run the software. However, I also wish to suggest a 
potential solution here based on a concept we shall 
call the ‘job profile’. 

The job profile is a collection of definition files 
and parameters which define how the translation 
processes work, and the work flow for a given job or 
collection of jobs. There should be a single, simple 
interface for the user which groups together all the 
information for viewing ог modification. 

Amongst other things, the job profile should 
define: 

e rules for segmentation of text during data import 

e target languages for -he data preparation stage 
(this will automatically create directories and tar- 
get files for all languages simultaneously) 

e whether or not source matching, change analysis 
or machine translation should be used, and if so, 
what databases should be used for comparison, 
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how information should be ranked for presenta- 
tion to the user, what text, if any, should be 
automatically inserted into the target file, etc. 

e rules for the filtering of non-translatables 

e what stages the jobs will move through during 
translation, and what criteria must be satisfied to 
move from опе stage-to another 

e what wrapping rules and default editing param- 
eters are appropriate for the current job. 

Restricted access rights should be associated with 
certain types of information, and certain defaults 
should also be available. 

If product specific changes have to be made, only 
the appropriate parts of the profile should be modi- 
fied. It should be easy to access and change such 
information. 

One of the key benefits ofthe job profile is that it 
can be easily copied from project to project with 
minimal changes (often no change). When dealing 
with similar types of text, the use of user profiles can 
thus reduce the administrative setup time consider- 
ably — once the data has been imported, the whole 
process of data preparation for any number of target 
languages could be initiated by a single click on a 
button, since all the information needed by the sys- 
tem is already contained in the user profile. 


Enhance user friendliness and efficiency 
Fora variety of reasons, it is not hard to find transla- 
tion systems where user friendliness could be 
improved. For example, systems may have been put 
together quickly to meet schedule requirements, they 
may have been based on tools originally intended for 
other purposes or they may simply have been devel- 
oped without due attention to users’ needs. 

All systems I have so far used leave a good deal of 
scope for human factors improvements. 

There are those who would say that real produc- 
tivity gains are made by looking at ways of 
preprocessing the data, rather than paying attention 
to the user interface. I agree that one must seriously 
look at the preprocessing in order to gain productiv- 
ity (in fact, I will be making some recommendations 
along these very lines a little later). Nevertheless, I 
feel that productivity can be significantly improved 
by attention to the user interface in the following 
ways: 

1. The greater the efficiency of the user interface, the 
greater the productivity of the translator or 
validator. This is especially true for the highly 
repetitive actions which occur during editing. Care- 
ful thought should be given to the way the 
translator/validator works, and how the system 
can be made to respond to the user’s needs as 
quickly and as simply as possible. For example: 
How can appropriate dictionary information be 
inserted into the target document faster? How can 
the user access the parts of the text they want to 
work on faster? How can the user raise queries 
faster? 
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2. User friendly interfaces will reduce the time spent 
in training and retraining users. This is particularly 
relevant where translation staff is subject to high 
turnover or where the translators have to work 
with a number of different translation environ- 
ments in the course of their work. The interface to 
the translation tools must be as simple, informa- 
tive andintuitive as possible. Great attention needs 
to be paid to the way the user sees what he or she 
is doing, in order to allow for easy use of what 
could, otherwise, become a complicated system. 

3. Streamlining and simplifying user interfaces for 
administrative tasks such as preparing the transla- 
tion dàta, terminology development, handling of 
queries, etc., can often greatly reduce the over- 
heads in cost and scheduling associated with the 
actual translation exercise. The points at which the 
user intervenes need to be kept as simple and few 
as possible. 

One potentially very useful tool for translators 
would be a ‘morph modifier’. This would allow 
translators quickly and easily to post-edit word end- 
ings, agreements, etc. Automatic adjustment of 
endings, etc., when pulling items from the dictionary 
into the target text would also improve productivity. 


Minimize the work of the translator 

If weare to achieve simultaneous multinational launch 

of products into the worldwide marketplace, we need 

to ensure that translation is as efficient and produc- 

tive as possible. One of the key areas in which 

translation productivity can be tackled occurs before 

the translator actually sits down to edit text. The 

objective of the data preparation stage should be 

twofold: 

1. to reduce, as far as possible, the amount of text the 
translator has to translate 

2. to provide as much assistance for the translator as 
possible while he or she is editing or supplying text 
(without swamping or slowing him or her down!) 

It seems to me that there is the potential in the 
future to improve greatly the productivity of update 
translation by an integrated approach which includes 
an application ofthe concept of text alignment. I will 
refer to this approach as ‘change analysis’. 

Figure 2 illustrates the composition of an exam- 
ple document or software text extraction which has 
been updated. (The percentage figures for each com- 
ponent part are chosen so as to illustrate clearly the 
points which will be made.) The figure shows a 
document or software update of which 70% of the ' 
source has not been changed since the previous 
version. A further 5% of the text is composed of 
segments which will not need to change in transla- 
tion. Of the remaining 25%, 15% of segments haye 
source text which will exactly or approximately match. 
that of previously translated text held in databases = 
it is therefore labelled ‘familiar text’. The remaining 
10% is labelled ‘unfamiliar text’. з 
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Figure 2, Composition of the example update 
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If the text in our example is simply sent to a 
machine translation system, there is no way of carry- 
ing forward into the new document the lessons learned 
from post-editing the previous mistakes. All the mis- 
takes made last time will be repeated and will have to 
be post-edited again. In addition, machine transla- 
tion systems are still only of limited use for complex 
text. 

The predominant way of tackling this problem at 
present is through the use of source matching. Source 
matches are obtained by matching the current source 
text in one way or another against that of previously 
translated databases. Where an exact or approximate 
match is found, the corresponding target segment is 
presented to the translator in a reference area or in the 
target text itself. 

Many existing systems already rely on source 
matching (i.e. repetitions or translation memory) to 
supply potential translations for all of the unchanged 
and familiar text indicated on the diagram. Source 
matching, however, provides only guesses at possi- 
ble translations, usually taken in isolation from the 
context of the segment; therefore they must all be 
checked carefully. 

Where the recalled text is not exactly what is 
required, post-editing must take place or a new target 
segment must be typed in. This is far from ideal when 
you consider that 70% of the source text had not 
changed anyway. 

I propose that future systems can gain signifi- 
cant productivity benefits for update translations by 
introducing the concept of change analysis. If a 
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program is run on the data to ascertain which parts 
of the text are unchanged from the previous ver- 
sion, and label those appropriately, the translation 
tools should be able to insert the previous transla- 
tions from the database into the target document 
automatically. It is important to bear in mind that 
approaching the task in this way relies on a high 
degree of certainty that the translation supplied was 
exactly the same as that associated with this very 
piece of source text the last time around. If the 
translation system labels these items of text, the 
translator should be able to skip unchanged text 
automatically if desired. The translator should still 
be able to make edits to unchanged text, if wished — 
for example to ensure that unchanged text preced- 
ing or following on from changed text flows correctly 
— but if there are, say, ten pages of uninterrupted 
text which have not been changed, the translator 
should be able to skip right past them. (Again, this 
is usually acceptable sincz, unlike source matching, 
the system has used contextual information to find 
the exact same translations as were used previously 
for these source segments.) Anything the translator 
misses should be captured during the validation 
stage. 


Implementation 

How would one implement such a system? It may 
seem that this is not a trivial problem for documenta- 
tion text, but programs and methods do already exist 
which may achieve reasonable results. 

Rank Xerox has already implemented such a 
screening system at the level of complete software 
messages, via the use of message identifiers (IDs). 
We ask the product teams supplying software to 
associate each message with a unique ID which is 
unchanging throughout the product life. We use this 
ID to locate the previous translation of the message 
or icon in our database, and compare the text and 
properties against the new version. If there is no 
significant change, we can automatically insert the 
previous translation into the new database and the 
translator does not have to see the message. 


Additional productivity benefits 

Change analysis can have additional productivity 
benefits. Sometimes, for example, only a small change 
may have been made to a long message. In some 
cases it might only be a letter which has been put in 
upper case, or an article which has been added to the 
source — things which usually don’t affect the trans- 
lation. In other cases, it may only be a comma which 
has changed, but this may affect the translation sig- 
nificantly. In these cases, the change analysis program 
should draw the attention of the translator immedi- 
ately to the actual changes made to the source, so that 
the impact to the previous translation can be quickly 
assessed. Otherwise the translator may spend a lot of 
unnecessary time scrutinizing the text to find the 
changes. 
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An integrated approach 

Change analysis does not, by any means, do away 
with the need for source matching. It should be seen 
as only one of a series of operations on the text 
during the data preparation stage. Having identified 
the unchanged text in our example above and copied 
the old translations to the target file, we have 30% of 
segments remaining. 

Our example shows that 5% of segments contain 
text which will remain unchanged in translation (for 
example, numbers). Determining what constitutes a 
‘non-translatable segment’, and how to deal with it, 
should actually be done on a language by language 
basis. For example, decimal numbers need transla- 
tion in some European languages but not in others. 
All numbers may need to be translated for Saudi or 
Thai markets. The key point is that, if the tools 
contain the appropriate information, non-translatables 
can be transferred automatically to the target docu- 
ment and leave the translator with more time to 
address the real translation work. 

Continuing with our example, we now have 25% 
of segments left. This is all text which has been 
changed in some way. Of this, three fifths will actu- 
ally match against the source of other databases 
during source matching. Alternative recalls? should 
be ranked and provided either in a reference area, or 
a mixture of target file and reference window, ac- 
cording to preference. Whereas, with most current 
systems, the translator has to check that recalls are 
valid in their context for 80% of our example, use of 
change analysis has reduced this activity to 15% of 
the text. 

The remaining 10% of segments in the example 
could be translated in one of two ways. They could 
be translated interactively with the assistance of 
dictionary information in the reference window, or 
they could be sent off to a machine translation sys- 
tem and post-edited — whichever is most appropriate 
for the material and circumstances in question. 


INFORMATION MANAGEMENT 

Improve translation context 

The translator needs to understand the context of a 

sentence or word he or she is translating for two 

fundamental reasons: 

1. to understand clearly the meaning of the text and 
its relationship to surrounding text 

2. to verify that the translation fits in with space and 
formatting constraints and the surrounding text. 

The translator needs to see text with as much 
contextual information as possible. Presenting text 
in paragraphs, rather than segment by segment, is a 
good first step towards improving contextual infor- 
mation. 

For translation of software messages, Rank Xerox 
has used a system for several years which shows the 
text in the editor within the display box into which it 
must fit. Given the lack of expansion space typically 
provided for foreign languages by English-speaking 
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UI designers, this is an invaluable tool. It means that 
the translator can immediately tell if the translation 
doesn’t fit, and can try an alternative translation or 
abbreviation. This avoids an extremely long-winded, 
cyclical process of translating, loading up software, 
retranslating, loading up again, and so on. 

Rank Xerox also has a simulator of the user 
interface, linked online into the translation editing 
environment, The simulator automatically keeps in 
step with the message shown in the editing environ- 
ment. This is invaluable for dealing with things such 
as adjectival agreement (e.g. the word ‘enabled’ 
appearing on its own must be translated differently in 
some languages depending upon the gender or number 
of the text it qualifies). 


Facilitate the sharing of data 

In most systems, translators currently work in rela- 
tive isolation. There is no simple mechanism for 
immediate sharing of useful information. 

For example, if a translator makes a change to a 
dictionary, that change should be communicated 
immediately and automatically to other translators 
dealing with the same target language (but not those 
dealing with other languages). 

There should also be an automatic way of provid- 
ing documentation translators with the appropriate 
translation for a screen-based icon which was trans- 
lated previously. 

Similarly, query management systems are often 
unwieldy and labour intensive, and comprehension 
queries tend to be raised many times over. In an ideal 
world, translators would immediately know whether 
acomprehension query had already been raised about 
a particular piece of source text, and would be able to 
subscribe themselves to that query for as long as they 
felt they needed to know the answer. Queries should 
travel quickly and intelligently across networks or 
modems to and from defined addressees. 


SUMMARY 

Suggestions for innovation 

e The use of change analysis to reduce the amount of 
work involved in update translation. 

e The use of job profiles to reduce the need for 
administrative intervention. 

e The representation of available space for software 
messages within the editing environment so as to 
eliminate the cyclical approach to ensuring that 
text fits on the UI. 

e The use of simulators linked online to the editing 
environment, to improve the context available to 
translators. 


Recommendations 

Openness 

e Aim for a single translation environment which 
can interface to any software or documentation 
product environment. 

e Design software in such a way that it can easily 
support, or be extended to support, any number of 
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languages, according to the changing demands of 
the business. 


Flexibility 

e Give the user greater control over how the system 
is used by making it more data-driven. 

e Allow for a mixture of machine translation and 
other forms of translation within the same docu- 
ment. З 

ө Enable decentralized translation from а central- 
ized administrative hub. 


Productivity 

e Reduce the complexity and amount of administra- 
tive support wherever possible. 

e Enhance the user friendliness and productivity of 
the translation environment wherever possible. 

e Minimize the work of the translator by carefully 
examining ways of building the target document 
automatically. 
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Information management ; 

e Improve the context avzilable to the translator by 
whatever means possibk. 

ө Enable immediate and intelligent sharing of useful 
information among translators and support staff. 


NOTES 


1. ‘Source matching’ is a Xerox term corresponding 
to ‘repetitions processing’, ‘translation memory’ or 
*example-based translation’. A translated string re- 
trieved via source matching is referred to as a ‘recall’. 


2.'Change analysis’ is a means of detecting un- 
changed text during updates and automatically 
transferring the appropriate translation into the tar- 
gettext. It will be dealt with in more detail below, but 
it is a practical application of ‘text alignment’. 


3.A recall is the foreign text retrieved from the 
previous database during source matching which 
corresponds to the source segment matched. 
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Abstract 

One of the major areas of concern of information management is the collective use, exchange and development of 
the information activities of the organization in order to achieve its objectives. For this to be effective, either 
information management will require the right kind of environment or culture in which to operate, or it will have 
to be instrumental in creating that culture. An organization’s culture is shaped by many factors — history, 

experience, values, beliefs, successes, failures, the environment in which it operates, and the personalities which 

lead it. Culture, however, cannot be precisely defined because it is something that is perceived, something that is 
felt. It also has much to do with the way people are managed. Information management can have an impact on the 
organization’s culture although the opposite is more often the case especially in terms of organizational structure, 

the distribution of power, the organization’s image, the style of the chief officer, risk-taking and change, secrecy 
and openness and the way people work together and cooperate, or don’t. Information management is also about 
how people interact with systems. There are, in brief, two contrasting ways in which this can happen; one which 
puts systems first, the other which puts people first. Evidence is now growing that people must come first and can 
no longer work in purely mechanistic ways in which they become data-processors on number-crunchers. An ideal 
working culture may be one where people can develop satisfactorily with the systems to which they contribute but 
which do not put them in a straitjacket. How can this be achieved? What kinds of culture generally exist within 
organizations? Are there any types of culture which may be more conducive and receptive to information 

management? Similarly, can the culture of an organization be changed? Can it be managed? 


Organizational culture and its relationship with 
information management 
I must start by saying that Tam not really sure what an 


nition for IM: the effective management of informa- 
tion resources and activities to contribute towards 
the achievement of the organization’s objectives. IM 


information organization is. I suppose that the clos- 
est might be a library. A little while ago I heard 
librarianship described as ‘Managing information 
resources for people.’ I would suggest that that is an 
idealized view and that information management has 
evolved as a function surely and simply because 
librarianship had not responded to the challenge of 
that definition. Information management has also 
evolved, as a function and a concept because it is 
organization-based. This leads us to a different defi- 


therefore relates to an organization’s behaviour, or 
its culture. I would not contend that culture is a 
synonym for behaviour. Culture is much more than 
that — it is the buzzword of the ’80s (and the ’90s). It 
is the answer to everything, but a culture cannot be 
precisely defined, for it is something that is perceived, 
something felt’. Culture is shaped by many factors: 
history, experience, values, beliefs, the environment, 
the leader’s personality and so оп. Edgar Schein de- 
scribes two very different organizations’. 


TWO CONTRASTING CULTURES 


Organization A operates under the assumption that: 

@ ideas come ultimately from individuals 

ө people are responsible, motivated and capable of 
organizing themselves 

ө truth can only be arrived at by fighting things out in 
groups 

e such fighting is possible because members of the 
organization see themselves as a family who will take 
care of each other 

` Characterized by: 

— open office landscapes 

— few closed doors 

— people milling about in conversation and discussion 

~ an air of informality 








Organization B operates under the assumption that: 

@ truth comes ultimately from older, wiser and higher 
status people 

@ people are capable of loyalty and discipline in carrying 
out instructions 

ө relationships are basically lineal and vertical 

@ each person has a niche in the organization which 
cannot be invaded 

Characterized by: 

— a hush in the air 

— closed doors 

— agenda and pre-arranged appointment 

— deference and obedience to rank 

— formality 
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If information management is all about the col- 
lective harnessing, interaction, free exchange and 
development of the information activities of the or- 
ganization’s staff then clearly we have to plump for 
Organization A, as the kind of environment in which 
we should want IM to flourish. Organization A 
embodies many of the values and benefits of the 
flatter organization and empowerment now sought in 
the 1990s as a more effective stance or structure on 
which to have competitive survival. Clearly the strict 
disciplines, rigid job descriptions and people-in-boxes 
with rigidly defined functions and activities of 
Organization B are to be consigned to a past culture. 
OK perhaps for a time of stability, but no good in 
times of rapid change. 

Some seven years ago at an Aslib conference 
entitled The Adaptive Information Manager a number 
of excellent papers emerged, one of which was called 
‘The impact of IM on corporate cultures’ by Woody 
Horton’, Horton draws up a list of potential or actual 
cultural impact of IM on the organization’s culture: 
1. Organizational structure — if IM means the inte- 

gration of information technologies, generating 
cross-departmental information programs, organ- 
izing so that information activities can contribute 
to the achievement of the organizations objec- 
tives, then this also implicitly means ‘reshuffling 
people and blocks on the organizational charts’. 

2. Redistribution of power. ЇМ can lead to a reversal 
of the old command and control organization with 
its hierarchies and vertical chains of command. 
Integrated information planning can contribute 
not only to the effective decentralization of deci- 
sion-making, but also faster, flatter and more 
responsive organizational structures. 

3. Image and personality of the organization: intro- 
verted and extroverted; introverted — cautions, 
conservative; perhaps more alive to the role of 
information resources in the scheme of things; 
extroverted — information is someone else’s prob- 
lem. 

4. Style of the CEO; autocratic/benign; delegators/ 
one-shows; ‘Managers who really do believe in 
the value of putting IM tools directly in the hands 
of end-users, instead of trying to make them jump 
through all kinds of hoops, to get at needed infor- 
mation resources, seem more prone to adopt the 
IM idea and make a success out of it. They see 
sharing as a positive and constructive objective, 





those at the top. Versus the notion gaining ground 
that you don’t browbeat or dictate any more, you 
negotiate, persuade and share as the best way to 
get people involved and committed. 

7. Impact in human terms. The system as a control 
versus the system as a tool. The person as a cog 
versus the person as a contributor. Technical sys- 
tems for delivering technical services or systems 
which do not hand all power to the machine but 
call for a balanced relationship between people 
and systems — socio-technical systems to achieve 
the organization’s objectives. 

Let us look now at the management of informa- 
tion and how it may be practically applied in the 
organization. 

There seems broadly to be two ways of imple- 
menting information management in the organization. 

One takes the techniques, body of knowledge and 
skills, and the processes of IM and then prescribe 
them to the organization. The results of this approach 
risk isolation in various departments, lack of integra- 
tion and occasionally lack of coordinating 
wide-ranging information activities to achieve the 
organization’s objectives. It also pays no heed to the 
organization’s culture. The other is to approach the 
problem from an organizational point of view and 
perform a functional analysis. This is the approach of 
Orna's Practical Information Policies’. 

Orna does not start with IM and all its parapher- 
nalía. She starts with the organization's objectives, 
its management style, its organizational structure 
and the relationship of all these to the current envi- 
ronment. In short she starts with the business and the 
power-base and the positioning of the organization. 

Briefly and far too simplistically Orna's thesis is 
as follows. 

1. What are the objectives of the organization? 

2. What functions are acquired to achieve these ob- 
jectives? 

3. What (information) activities are acquired to per- 
form those functions? 

4. What (people) skills are acquired to carry out these 
activities? 

5. What tools do the people need to harness and 
capitalize on those skills to perform those activi- 
ties? 


Two scenarios for IT 
Orna provides two alternative scenarios for IT, one 
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System 2 can result in people and technology work- 
‘ing together towards the organization's objectives 
where the users of the system become involved in 
making the decisions on organizational change. 

` Аза control the technical system targets high 

productivity but leads to dissatisfaction; and isola- 

tion for staff. People become data-processors and the 

‘structure’ of the system in which they are a cog leads 

to repetitive, inflexible activities. 

With a people-centred approach to IT the 
organization needs to become aware of the skills, 
knowledge, levels of expertise and potential of the 
people who work for it. In designing the system, the 
use and exploitation of those skills and abilities is 
central. And at the highest level, not at the lowest. 
People responsible for specific tasks should contrib- 
ute to defining how those tasks should be done and 
how they should be combined into jobs. The change 
process can be given to explaining changes and 
securing commitment to them. The uses of technol- 
ogy must be planned so that people may contribute 
what they are best at. The key to successful imple- 
mentation is a planned and continuous training 
programme with the people concerned involved in 
saying what the training should be. 

The technical system therefore should be posi- 
tioned not as a control but as a tool, an aid, a means 
of carrying out those activities which target the or- 
ganization's objectives. 

Orna's thesis puts people first in an organization 
which has recognized and valued the importance of 
getting IM right. The thesis, however, poses 2 ques- 
tions: 

1. Will people be willing to accept the higher levels 
of responsibility and self-motivation that this ap- 
proach requires? 

2. Will the organization be receptive to IM as an 
apparently dominant organizational factor? 

There have traditionally been the managers and 
the managed and it is not merely organizational 
culture — whether it be structure, management style 
or despotic leadership — which has sought to main- 
tain this distinction. There is an enormous grey area 
between Theory X and Theory Y. Some may wish to, 
and be able to contribute more than others but the 
‘lesser’ contribution may not be unsatisfactory. Much 
depends on what the organization wants and what it 
is like. So we are back to culture. What kinds of 
culture — in a little more details to Schein’s A or B — 
may be receptive to Orna’s IM — task analysis ар- 
proach? 

In his Understanding Organizations? Charles 
Handy points to four different types of Organiza- 
tional Culture: 

1. The Power Culture: 

This culture depends on a central power source 
with influence spreading out from the central fig- 
ure. If the centre chooses the right people, who 
think in the same way as the centre, they can be left 
to get on with the job. There are few rules and 
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procedures, little bureaucracy — it depends for its 
effectiveness on telepathy and personal conversa- 
‘tion for communication. Such organizations are 
usually small and extrovert, and dependant on the 
person in the centre, can move quickly to threat, 
danger and opportunity. 
Information and information management will get 
done as someone else’s problem or not at all. 

2. The Role Culture: o rs 
Stereotyped as a bureaucracy, its strength rests in 
defined functions and specialisms, with its roles, 
procedures and set job descriptions. The job de- 
scription is more important than the people who 
do it — the job description is evaluated — not the 
person — nor the job for that matter. Close to 
Schein's Organization B, the role culture works 
OK in a stable environment. But its people have 
specialist functions and probably not the skill or 
expertise to adapt to change. If the power culture is 
not information sensitive, then the role culture 
will fix and departmentalize the library operations 
as opposed to active and changing information 
Sources. 

3. The Task Culture: 

Job or project orientated with the emphasis on 
getting the job done. The matrix organization seeks 
to bring together the appropriate resources, right 
people at the right level of organization and let 
them get on with it. Influence is based more on 
expertise that on position. It is also a team culture 
where the outcome or the product is the result of 
group work, groups which can be formed and 
dissolved as needs arise. The task culture is appro- 
priate when flexibility and sensitivity to the market 
orenvironment are important and information sys- 
tems are fluid, adaptable and responsive. 

4. The Person Culture: 

In this culture the individual is the central point. 
Not many medium or large-scale organizations 
can exist with this culture since organizations tend 
to have objectives over and above the collective 
objectives of those who compose them, and in this 
organization the interests of the individuals take 
precedence. Handy cites barristers’ chambers, 
architects and lawyers partnerships, and small con- 
sultancies with this orientation. 

To return to Orna's approach. Clearly it requires 
a receptive culture with willingness and support 
from the top. The management style must be able to 
embrace participation, disagreement and criticism as 
well as reappraisal and reorganization. 

Within the current framework of change experi- 
enced by private sector organizations, they are making 
a transition from the role to the task culture. Librar- 
ianship failed to make an impact on the role culture 
of many organizations — including most emphati- 
cally — its own with libraries themselves. If Orna's 
approach, or any semblance of it, — is to have any 
influence then it will arguably be in the frame of a 
task culture. 
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- But what if that t transition is not taking place? 


What if IM has no open door of change to help it оп. i 


its way? What if the organization is firmly fixed in a 
power or role culture? 

There are two answers. The first has already been 
given. The second is to manage ће: culture of the 
organization. . : | | 


Can organizational culture be managed? 
-Firstly Oma’s people-centred, task analysis approach 


^ has a much better chance than librarianship did. It 


does because it is the part of the values and beliefs of 
the organization іп а way which the library, with its 
inward focus on operations, never was. The values 


and . beliefs of Orna's approach are the very values . 


and beliefs of the organization. And values and be-- 
liefs are.central to culture. 
Secondly managing culture places the onus on 


^. the manager to have the patience for the long, slow 


day-by-day process ‘of explaining, presenting, com- 





municating, persuading, arguing, implementing, pro- +: - 
- ‘viding evidence, demonstrating-and proving the set  °.: 
of key values and principles that most characterize | 

` the desired culture. | 


їп the final analysis, this has nothing to do with 
librarianship, апа very little with information in 
itself, but everything to do with. management skills 
and good кашне расе. 
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